Innovative Legal Insights: Case Study on Assessing Generative AI Models in Dialogflow CX Knowledge Base for Crafting an AI Chatbot.

Introduction

When navigating Indian roads, especially for newcomers or those unfamiliar with road rules and regulations, understanding India’s road rules and regulations becomes pivotal. Lack of knowledge about these guidelines while driving can lead to potential risks and violations. Thus, for a safe and compliant driving experience, it’s imperative to familiarize oneself with India’s road laws before hitting the road. With an increasing number of vehicles on the roads and diverse infrastructural complexities, ensuring adherence to road rules becomes paramount for enhancing safety and efficiency in transportation.

This case study delves into applying advanced technology to tackle these critical issues. Specifically, it explores the integration of a Knowledge Base within Dialogflow—an AI-powered conversational platform—to disseminate comprehensive information regarding India’s road regulations, while concurrently addressing concerns related to traffic control, offenses, penalties, and procedural intricacies.

Significance and Impact

Improving road safety and compliance with traffic laws are imperative for reducing accidents and ensuring smoother traffic flow. By employing innovative technological solutions like Dialogflow’s Knowledge Base, this study endeavors to enhance public awareness and accessibility to essential information. This could potentially reduce violations and contribute to a safer road environment across the country.

Objective

The primary aim of this study is to assess the efficacy of generative AI models using Dialogflow’s Knowledge Base feature as a platform for disseminating information on India’s road rules and regulations, aiming to create a more accessible and user-friendly resource for individuals seeking information about India’s traffic laws.

Approach and Methodology

The methodology involves using Dialogflow’s capabilities to structure a knowledge base that encompasses India’s road rules and regulations. By employing the capabilities of Dialogflow’s knowledge base, we created a generative agent. The Dialogflow CX console offers a selection of diverse generative AI models. Different foundational models were utilized by us to enhance the agent’s ability to understand and respond to user queries in a natural and informative manner. By harnessing the power of generative AI within Dialogflow CX, a user-friendly AI chatbot is constructed. This chatbot enables anyone to easily inquire about any aspect of Indian road rules and regulations and receive comprehensive answers to their queries. We tested the AI chatbot with a set of predetermined questions related to Indian road rules and regulations by utilizing each available generative AI model in the evaluation process. We utilize Google Cloud Storage for assessment, wherein the stored data consists solely of unstructured PDFs, with no inclusion of website links or any structured data. We asked it different questions about traffic rules like driving when mentally or physically unfit to drive, Offences relating to licences, Driving dangerously, offences, etc. We explore the impact of individual generative AI models on improving AI chatbot performance in responding to user queries.

The list of supported generative AI Models in Dialogflow CX Console are:

  1. text-bison@001
  2. text-bison@002
  3. text-bison@001 tuned (conversational)
  4. text-bison@001 tuned (informational)
  5. Gemini-pro

The table summarizes the key details of each model for an easy comparison.

Model NameDescriptionKey CharacteristicsLanguage Support
text-bison@001A larger model with enhanced capabilities for text generation and language understanding.More comprehensive knowledge base, potential for greater accuracy and fluency.Available in all supported languages.
text-bison@001 tuned (conversational)Fine-tuned for engaging and coherent conversations.Prioritizes natural language flow, conversational style, and engaging responses.Only English is supported at the moment.
text-bison@001 tuned (informational)Fine-tuned for providing concise and informative answers to questions.Focuses on accuracy, factual correctness, and clarity in responses.Only English is supported at the moment.
text-bison@002A newer generation model with further improvements in accuracy and fluency.Enhanced capabilities for language understanding and text generation.Available in all supported languages.
Gemini ProDesigned to handle natural language tasks, multi-turn text and code chat, and code generation.Excels in complex tasks, code understanding, and flexibility for multi-turn interactions.Available in all supported languages.

1. text-bison@001

The name of the PaLM 2 Text Bison large language model is text-bison. It’s a foundational model that excels across diverse natural language tasks, including entity extraction. The text-bison@001 model has several use cases, including:

  • Summarizing documents to create shorter versions with key details.
  • Answering questions based on the content of a text.
  • Classifying texts by assigning labels.

Based on evaluations by utilizing the text-bison@001 model and focusing on specific aspects of traffic regulations, we draw the following conclusions about the AI chatbot’s effectiveness,

  • Upon analysis, it has been observed that the text-bison@001 Model consistently fails to provide accurate answers to user inquiries, particularly even when the answers are available within the knowledge base.
  • Despite testing it with a lot of different scenarios, the model’s performance wasn’t as accurate as we hoped it would be. 
  • In cases where responses are generated, they often lack precision or reliability, indicating a significant challenge in interpreting and presenting relevant information within the context of India’s intricate traffic regulations.
  • Having experimented with a variety of questions concerning road rules and regulations, I noted that the text-bison@001 model demonstrated an overall accuracy of 30%, as per observations.

2. text-bison@001 tuned (informational)

text-bison@001 tuned(informational) a tuned version of the PaLM (Pathway Language Model) model, specifically optimized for informational tasks. Ideal for tasks requiring direct and concise information retrieval. Might be more prone to errors when dealing with ambiguous or unclear prompts, as it prioritizes factual accuracy. 

An evaluation of the AI chatbot, utilizing text-bison@001tuned(informational) and specific questions, resulted in the following conclusions,

  • Upon analysis, the text-bison@001 tuned (informational) model reveals shortcomings when handling NLP-based inquiries, notably within the domain of India’s road rules and regulations.
  • The model frequently supplies concise information in response to user queries. 
  • It consistently struggles to proficiently handle subsequent or follow-up questions within the context of India’s intricate traffic laws and regulations.
  • text-bison@001 tuned(informational) model achieved an overall accuracy of 60%.

3. text-bison@001 tuned (conversational)

text-bison@001 tuned (conversational) is better suited for tasks involving dialogue and creative text generation. Specializes in engaging in natural, open-ended conversations. select text-bison@001(conversational) for applications demanding natural conversations and creative text generation. For additional information, see Scenarios to use model tuning.

To assess the performance of our AI chatbot, we conducted targeted evaluations employing the text-bison@001(conversational) model and a set of predefined questions. Our findings are summarized as follows,

  • Model occasionally misrepresented or misunderstood the nuances of Indian rules, leading to inaccuracies in its responses. 
  • Furthermore, there were occurrences where the model supplemented its answers with additional information not sourced from the provided dataset.
  • The text-bison@001 tuned (conversational) model showed an overall accuracy of 60%.

4. text-bison@002

text-bison@002 is a powerful foundational model within the PaLM 2 family, adept at handling a wide range of natural language tasks.

Key takeaways:

  • text-bison@002 is a versatile and stable foundational language model.
  • It excels in tasks like sentiment analysis, entity extraction, and content generation.
  • Its stable version assures consistent and reliable performance.
  • Millions of parameters underpin its impressive capabilities.

Through targeted evaluations with the text-bison@002 model and pre-defined questions, we gained valuable insights into the AI chatbot’s capabilities. Here are the key takeaways.

  • It occasionally failed to provide answers to queries even after the information was already present in the knowledge base. 
  • While it adequately addresses follow-up questions, it struggles when presented with fewer details. But it can answer the follow-up question when we provide more details as shown in the below example. For example,
    Question: Is it compulsory to hold a driving license while driving?
    Answer: Yes, it is compulsory to hold a valid driving license while driving in India.
    Follow-up: Why?
    Answer: I didn’t get that. Can you repeat?

    Alternate phrasing:

    Follow-up question: Why is it compulsory to hold a driving license?
    Answer: As per the Motor Vehicles Act, 1988, it is mandatory to possess a driving license while operating a motor vehicle in any public area. This requirement ensures that drivers possess the necessary expertise, capabilities, and legal authorization to safely operate a vehicle on public roads.
  • text-bison@002 model exhibited an overall accuracy of 70%, based on observations.

5. Gemini-pro

The Gemini Pro foundation model is a powerful language model known for its expertise in comprehending and generating language. It supports both single-turn prompt and response interactions and continuous, multi-turn conversations, even extending to tasks involving code understanding and generation.

Limitations of the Gemini Pro model:

  • Gemini also does not extract any information beyond 2 minutes of the video
  • It may be less accurate in understanding rotated images.
  • Can struggle with tasks requiring multiple reasoning steps.
  • generate incorrect content for extensive text parsing.
  • Not suitable for interpreting medical images (e.g., x-rays, CT scans) or providing medical advice

The gemini pro model’s performance in answering pre-defined questions related to Indian road rules and regulations provides the following insights into the AI chatbot’s strengths and areas for improvement.

  • gemini pro emerges as superior to the previously discussed models.
  • Despite the availability of pertinent data in the knowledge base, there are occasional instances where Gemini Pro fails to respond to user queries.
  • This model demonstrates proficiency in addressing follow-up questions when we present follow-up inquiries with require additional specifics or details. 
  • gemini pro model displayed an overall accuracy of 90%, as per our observations.

To evaluate the outcomes of individual Generative AI Models in conjunction with a Knowledge base in the Dialogflow CX console, navigate to the Generative AI sub-tab within Agent Settings after selecting the Knowledge base Agent in your GCP project.

Check the link for guidance if you haven’t made the Dialogflow knowledge base yet.

Results

By presenting these results within the Knowledge Base Dialogflow framework, we intend to enhance accessibility, compliance, and understanding of India’s road regulations. These findings serve to underscore the potential and effectiveness of various models in facilitating easier access to crucial information within the realm of Indian road laws.

The evaluation criteria encompassed parameters such as language comprehension, contextual accuracy, response coherence, and the models’ capacity to present information pertinent to Indian road rules within the Knowledge Base Dialogflow environment.

We measured the accuracy of several language models by running them through a pre-defined set of questions, just like the ones used for Gemini Pro. We kept track of all the different answers Gemini Pro gave to each question.

Gemini-pro Model Results

Q1: Speed limits for different types of vehicles on Indian roads?

Ans: The speed limits for different types of vehicles on Indian roads are determined by the State Government or an authorized authority. These limits are set to ensure public safety and convenience and may vary depending on the type of road or bridge. Additionally, the National Highways Authority of India or other authorized agencies may impose speed limits on national highways.

Q2: Legal blood alcohol limit for driving in India

Ans: The legal blood alcohol limit for driving in India is 30 mg. per 100 ml. of blood.

Q3: What are the guidelines for using mobile phones while driving in India?

Ans: According to the Motor Vehicle Act, 2019, it is illegal to use handheld communication devices while driving in India.

Q4: Any fine for use on a mobile phone while vehicle driving?

Ans: Yes, there is a fine for using a mobile phone while driving a vehicle. For the first offense, the fine is two thousand rupees. For a second or subsequent offense, the fine is ten thousand rupees.

Q5: What are the consequences for traveling without a proper pass or ticket in a stage carriage according to the Act?

Ans: Whoever travels in a stage carriage without having a proper pass or ticket with him or being in or having alighted from a stage carriage fails or refuses to present for examination or to deliver up his pass or ticket immediately on a requisition being made therefore, shall be punishable with fine which may extend to five hundred rupees.

Q6: What are the repercussions for allowing unauthorized individuals to drive vehicles as per the Act’s provisions?

Ans: The Motor Vehicles (Amendment) Act, 2019 states that allowing unauthorized individuals to drive vehicles is punishable with imprisonment for a term that may extend to three months, or with a fine of five thousand rupees, or with both.

Q7: If we fail to give free passage to the emergency vehicle then?

Ans: According to the Act, failing to allow free passage to emergency vehicles, such as fire service vehicles or ambulances, while driving a motor vehicle is an offense punishable by imprisonment for up to six months, a fine of ten thousand rupees, or both.

Analysis

After testing multiple questions on each model, we’ve identified specific strengths and weaknesses for each, detailed in the table below.

Sr.NoModel NameStrengthsWeaknesses
1.text-bison@001 Model• Responds to user queries from the given knowledge base.• Inconsistent accuracy, often failing to provide accurate answers when information is available in the knowledge base.
• Lack of precision and reliability in responses.
2.text-bison@001 tuned (informational) Model• Offers brief information in response to user queries.• Limited performance with NLP-based inquiries.
• Inability to effectively address follow-up questions.
3.text-bison@001 tuned (conversational) Model
• Its skill in understanding detailed legal information proves its ability to grasp important details highlighted in the marked sections.
• Generally improved responses compared to text-bison@001.
• Inaccuracy in some responses despite relevant data in the datastore.
• Tendency to provide extraneous information occasionally.
4.text-bison@002 Model• Commendable performance in answering various questions.
• Adequately addresses follow-up questions.
• Occasional failure to provide answers already in the knowledge base.
• Addresses the follow-up question with more details
5.Gemini Pro Model• Generally superior performance compared to other models.
• Proficient in addressing follow-up questions with additional specifics.
• Inability to respond to some user queries despite relevant data in the knowledge base.

Conclusion

In comparison to the other models, the Gemini Pro emerged as a superior model in certain aspects. However, occasional instances were noted where even Gemini Pro failed to respond to user queries, indicating room for improvement.

To enhance the models’ performance, it is recommended to focus on refining their accuracy, addressing challenges in handling follow-up questions, and minimizing occasional failures. Fine-tuning the models based on user feedback and incorporating additional contextual data may contribute to improved outcomes.