Monday, October 21, 2024

Efficient Multilingual Control of Robotic Dog Using LLM

Introduction

As the world of robotics continues to advance, the integration of artificial intelligence (AI) in robotic systems has become essential for making these machines smarter, more intuitive, and easier to control. One exciting area of development is the use of large language models (LLMs) to enhance the interaction between humans and robots. Recently, a question was raised in an LLM group about how to implement this integration.

The Challenge

The objective was to enable a robotic dog to understand and execute commands given in both English and Cantonese. However, there were key limitations to consider:

  1. Multilingual Capability: The model needed to understand and process commands in both languages accurately.
  2. Edge Device Compatibility: Given that the onboard GPU was a Jetson GPU with only 8GB of VRAM, the model had to be small and efficient enough to run effectively within this limited hardware capacity.
  3. Fast Response Time: The robotic dog should be able to interpret commands and respond almost instantaneously, maintaining a natural interaction experience with users.

To address these challenges, we implement a PoC utilized a quantized version of the Qwen 2.5 1.5B model, which provided a balance between size, multilingual capabilities, and performance.


Why Use a Quantized Version of Qwen 2.5 1.5B Model?

The Qwen 2.5 1.5B model was chosen for several reasons:

  1. Multilingual Capability: The model supports multiple languages, including English and Cantonese. This feature allowed the robotic dog to interpret commands accurately, regardless of the language used.

  2. Efficient Edge Computing: A smaller model was preferred to fit within the constraints of the onboard Jetson GPU. The Qwen 2.5 1.5B model was quantized, reducing its memory footprint, making it lightweight and compatible with the edge device. Quantization reduces the model size by converting the weights from 32-bit floating points to smaller data types, such as 4-bit, without significantly sacrificing performance.

  3. Optimized for Performance: Despite its smaller size, the model remained powerful enough to handle the command interpretation. By using the quantized version (Qwen2.5-1.5b-instruct-q4_k_m.gguf), it managed to provide a fast response time while consuming minimal VRAM.


Proof of Concept

We can quickly build a proof of concept (PoC) using llama.cpp to load the Qwen model.

The Prompt
You are the command center for an advanced robotic dog. Your role is to interpret user inputs and generate appropriate commands for the dog to execute. The available commands are:
- turn right
- turn left
- move forward
- move backward
- dance
- bark

Based on the user's input, create a list of one or more commands for the robotic dog. Output the commands as a JSON array.
Sample results

Two different Cantonese phrases are listed here (English translations are provided in brackets). The first one is straightforward, while the second requires understanding the user's intention to generate a command list.

Sample 1:
In this case, the model accurately interpreted the user's straightforward command, providing a sequence of actions for the robotic dog.

轉右向前行兩步,再吠兩聲嚟聽吓 (Turn right, move forward two steps, and then bark twice)

["turn right", "move forward", "move forward", "bark", "bark"]

Sample 2:
In the following case, the model was able to understand the user’s intention and interpreted "cheering up" as asking the robotic dog to perform an action that would be entertaining, like dancing. This showcases the model’s ability to grasp user sentiment and respond creatively.

我今日好唔開心,可以氹吓我嗎? (I'm feeling very sad today, can you cheer me up?)

["dance", "jump"]


Performance Summary

With llama.cpp and the quantized Qwen model, it leads to the following performance results:

  • Response Time: ~700 milliseconds in average on a Nvidia T4 Card. This means the model processed the input and generated commands in well under a second, ensuring a fluid interaction between the user and the robotic dog.
  • VRAM Usage: 2.8GB with default settings. By setting the maximum context length to only 500 tokens, the VRAM usage was reduced to 1.4GB, which is well within the 8GB limit of the Jetson GPU.

The efficient use of memory and fast response time demonstrated the feasibility of running LLMs on edge devices, even for multilingual applications.


Key Takeaways

The PoC demonstrated that it is possible to use a quantized version of a multilingual language model for real-time robotic control on edge devices. The key benefits included:

  1. Multilingual Support: The ability to understand commands in both English and Cantonese expanded the usability and flexibility of the robotic dog.
  2. Edge Device Compatibility: By using a smaller, quantized model, the AI was able to run efficiently on limited hardware without compromising performance.
  3. Real-Time Performance: Fast response times ensured that the robotic dog could react promptly, making interactions feel natural and engaging.

This proof of concept paves the way for more advanced, language-based control systems for robots that can be deployed on edge devices, making them more accessible and practical for various real-world applications.

Sunday, September 01, 2024

Llama 3 and 3.1: Advancements in AI Translation Capabilities

In the rapidly evolving field of artificial intelligence, Llama 3.1 has garnered significant attention since its release six weeks ago. This model builds upon the foundation laid by Llama 3, which debuted three months prior. While Llama 3 introduced impressive capabilities that hinted at the future potential of AI language models, it also revealed certain limitations that have been further explored and addressed with Llama 3.1.

Multilingual Challenges in AI

One of the critical areas where these limitations have come to light is in multilingual tasks, particularly translation. As AI continues to integrate more deeply into global communications, the ability to accurately and seamlessly translate between languages is paramount. However, recent tests with Llama 3 have shown that this is an area where even the most advanced models can face challenges.

Observations with Llama 3

Initial experiments with Llama 3, specifically using the 8B instruct model, revealed an interesting phenomenon. While the model performed admirably when asked to answer questions, its translation capabilities were less reliable. When prompted in English to translate a passage into Chinese, the model often returned incoherent output, far from an accurate translation.

What is red-tailed hawk?

The red-tailed hawk (Buteo jamaicensis) is a bird of prey that breeds throughout most of North America, from the interior of Alaska and northern Canada to as far south as Panama and the West Indies. It occupies a wide range of habitats and altitudes, including deserts, grasslands, forests, agricultural fields, and urban areas. Red-tailed hawks are opportunistic generalists and their diet is variable. In North America, they are most often predators of small mammals such as rodents. Prey that is terrestrial and at least partially diurnal is preferred, so types such as ground squirrels are preferred where they naturally occur. The bird is legally protected in Canada and Mexico, and in the United States by the Migratory Bird Treaty Act of 1918. This red-tailed hawk in flight was photographed at the Sacramento National Wildlife Refuge in California.

Please translate the answer to Chinese

(red téng sū chōu) (red téng sū chōu)

: , , ; , , , , . , , , .

RED-TAILED HAWK(Buteo jamaicensis) : , , , , . , , , .

To further investigate, tests were conducted with the 70B model, anticipating that its larger parameter count might overcome the translation issues observed with the 8B version. Surprisingly, the results were similar. Despite the significant increase in model size, the 70B model also struggled with translation when the prompt was given in English.

Key Insight: Language of Instruction

A crucial discovery was made when the prompt language was adjusted to Chinese. Both the 8B and 70B models were able to translate the passage correctly when instructions were given in the target language itself. This finding suggests that the models' translation capabilities are more effectively activated when the context is set in the language they are translating into.

Advancements with Llama 3.1

Further testing with Llama 3.1 has shown marked improvements. Both the 8B and 70B models performed well with English prompts, demonstrating enhanced translation capabilities compared to their predecessors.

Implications for Current Users

Despite the release of Llama 3.1 over a month ago, many organizations are still utilizing Llama 3. For these users, it is recommended to implement programmatic mitigation strategies:

  1. Detect translation prompts within the system
  2. Translate the prompts to the target language before passing them to Llama 3
  3. Implement these processes to optimize translation outcomes

By adopting these strategies, organizations can maximize the potential of their current AI translation tools, even if they have not yet upgraded to the latest model.

Monday, December 18, 2023

The State of Generative AI: A Survey of 2023 and Outlook for 2024

Introduction

The year 2023 witnessed significant advancements in the field of Artificial Intelligence (AI), particularly in the realm of generative AI. This article provides an in-depth survey of the key trends in AI development in 2023 and offers insights into the outlook for 2024, shedding light on the transformative effects of generative AI.

Key Trends in AI Development in 2023

Generative AI Advancements The year 2023 marked a period of remarkable progress in generative AI, with substantial implications for industries and workforces. The State of AI in 2023 report by McKinsey highlighted the transformative effects of generative AI, emphasizing its potential to revolutionize various sectors. OpenAI's GPT-4 emerged as a groundbreaking generative AI model, revolutionizing natural language processing and creative content generation. The explosive growth of generative AI (gen AI) tools was confirmed by the latest annual McKinsey Global Survey, with one-third of survey respondents using gen AI regularly in at least one business function.

Multi-modal Learning Another notable trend in AI development was the emergence of multi-modal learning, which brought forth new possibilities for AI models and their applications. This approach enabled AI systems to process and understand information from multiple modalities, leading to enhanced performance and versatility.

Economic Impact of AI The economic impact of AI became increasingly pronounced in 2023, influencing global economies and driving discussions on the future of work and productivity. AI could contribute up to $15.7 trillion to the global economy in 2030, more than the current output of China and India combined. Of this, $6.6 trillion is likely to come from increased productivity and $9.1 trillion is likely to come from consumption-side effects.

Outlook for 2024

Monumental Leaps in AI Capabilities The outlook for 2024 foresees monumental leaps in AI capabilities, particularly in areas demanding complex problem-solving, fueled by quantum advancements. These advancements are expected to redefine the boundaries of AI applications and unlock new frontiers in technology.

Rapid Transformations Across Industries 2024 is poised to witness rapid transformations across industries, with generative AI expected to play a pivotal role in reshaping business operations and driving regulatory discourse. This paper examines the real-world application of AI in multiple sectors, including healthcare, finance, agriculture, retail, energy, and automotive. Several case studies are described to illustrate the impact of AI on these industries. The AI Dossier highlights the most compelling use cases of AI in six major industries, providing insights into the practical applications of AI across various sectors.

Challenges and Potential Failures in Generative AI Initiatives Despite the promising outlook, 2024 also brings forth challenges and potential failures in generative AI initiatives. Predictions suggest that the majority of such initiatives may face obstacles and encounter failures, underscoring the complexities and uncertainties associated with generative AI.

Industry Outlook for Generative AI The industry outlook for generative AI reflects early promise, potential payoffs, and uncertainties. AI is helping organizations in the energy, resources, and industrials industry to rapidly innovate, reduce their climate impact, and increase business productivity. The impact of AI on a hospitality company has been studied, providing insights into the transformative effects of AI in the hospitality sector.

Impact of AI on Worldwide IT Spending Projections for 2024 indicate a significant impact of AI on worldwide IT spending, with expectations of substantial growth driven by the integration of AI technologies across various sectors. The influence of AI on global IT spending is set to shape the landscape of technology investments and strategies.

Conclusion

The year 2023 marked a pivotal phase in the evolution of generative AI, setting the stage for transformative developments in the year ahead. As we look towards 2024, the landscape of AI is poised for monumental leaps, rapid transformations, and the navigation of challenges and uncertainties. The journey of generative AI continues to unfold, shaping the future of technology and innovation on a global scale.

Thursday, December 07, 2023

Gemini: A New Family of Multimodal Models

Have you ever wondered what it would be like to have a model that can understand and reason across different types of data, such as text, images, audio, and video? Well, wonder no more, because Google has just introduced Gemini, a new family of multimodal models that can do just that!

Gemini models are trained on a large and diverse dataset of image, audio, video, and text data, and can handle a wide range of tasks, such as summarizing web pages, translating speech, generating images, and answering questions. Gemini models come in three sizes: Ultra, Pro, and Nano, each designed for different applications and scenarios.

In this blog post, we will give you an overview of the Gemini model family, its capabilities, and some of the exciting use cases it enables. We will also discuss how Google is deploying Gemini models responsibly and ethically, and what are the implications and limitations of this technology.

What is Gemini?

Gemini is a family of highly capable multimodal models developed at Google. Gemini models are based on Transformer decoders, which are enhanced with improvements in architecture and model optimization to enable stable training at scale and optimized inference on Google’s Tensor Processing Units.

Gemini models can accommodate textual input interleaved with a wide variety of audio and visual inputs, such as natural images, charts, screenshots, PDFs, and videos, and they can produce text and image outputs. Gemini models can also directly ingest audio signals at 16kHz from Universal Speech Model (USM) features, enabling them to capture nuances that are typically lost when the audio is naively mapped to a text input.

Gemini models are trained jointly across image, audio, video, and text data, with the goal of building a model with both strong generalist capabilities across modalities and cutting-edge understanding and reasoning performance in each respective domain.

What can Gemini do?

Gemini models can perform a variety of tasks across different modalities, such as:

  • Text understanding and generation: Gemini models can understand natural language and generate fluent and coherent text for various purposes, such as summarization, question answering, instruction following, essay writing, code generation, and more. Gemini models can also handle multilingual and cross-lingual tasks, such as translation, transcription, and transliteration.
  • Image understanding and generation: Gemini models can understand natural images and generate captions, descriptions, questions, and answers about them. Gemini models can also generate images from text or image prompts, such as creating logos, memes, illustrations, and more. Gemini models can also handle complex image types, such as charts, diagrams, and handwritten notes, and reason about them.
  • Audio understanding and generation: Gemini models can understand audio signals and generate transcripts, translations, summaries, and questions and answers about them. Gemini models can also generate audio from text or audio prompts, such as synthesizing speech, music, sound effects, and more. Gemini models can also handle different audio types, such as speech, music, and environmental sounds, and reason about them.
  • Video understanding and generation: Gemini models can understand videos and generate captions, descriptions, questions, and answers about them. Gemini models can also generate videos from text or video prompts, such as creating animations, clips, trailers, and more. Gemini models can also handle different video types, such as movies, documentaries, lectures, and tutorials, and reason about them.
  • Multimodal understanding and generation: Gemini models can understand and reason across different types of data, such as text, images, audio, and video, and generate multimodal outputs, such as image-text, audio-text, video-text, and image-audio. Gemini models can also handle complex multimodal tasks, such as verifying solutions to math problems, designing web apps, creating educational content, and more.

Evaluation

Gemini models have achieved state-of-the-art results on various benchmarks and tasks, demonstrating their strong performance and generalization capabilities. For example, Gemini models have achieved the following results:

  • Multimodal Machine Learning Understanding (MMLU): Gemini models have achieved the highest score on the MMLU benchmark, which measures the ability of models to perform a wide range of natural language understanding tasks across multiple modalities and languages. Gemini models have outperformed other models by a large margin, especially on tasks that require cross-modal reasoning and inference.
  • Multimodal Machine Learning for Multilingual Understanding (MMMU): Gemini models have achieved the highest score on the MMMU benchmark, which measures the ability of models to perform multilingual natural language understanding tasks across multiple modalities and languages. Gemini models have outperformed other models by a large margin, especially on tasks that require cross-lingual and cross-modal reasoning and inference.
  • ChartQA: Gemini models have achieved the highest score on the ChartQA benchmark, which measures the ability of models to answer questions about charts and graphs. Gemini models have outperformed other models by a large margin, especially on tasks that require complex reasoning and inference.
  • CoVoST 2: Gemini models have achieved the highest score on the CoVoST 2 benchmark, which measures the ability of models to perform simultaneous translation of speech and text across multiple languages. Gemini models have outperformed other models by a large margin, especially on tasks that require cross-modal and cross-lingual reasoning and inference.

These results demonstrate the impressive performance and potential of Gemini models for various applications and domains.

Applications

Gemini models have many potential use cases and benefits for various domains and users, such as:

  • Education: Gemini models can help students and teachers to learn and teach more effectively and efficiently, by providing personalized feedback, generating educational content, and facilitating communication across languages and modalities. For example, Gemini models can help students to solve math problems, learn new languages, and create multimedia projects.
  • Creativity: Gemini models can help artists and designers to create and explore new ideas and styles, by generating images, music, and videos based on their prompts and preferences. For example, Gemini models can help designers to create logos, posters, and websites, and help musicians to compose songs and soundtracks.
  • Communication: Gemini models can help people to communicate and collaborate more effectively and inclusively, by providing real-time translation, transcription, and summarization of speech and text across languages and modalities. For example, Gemini models can help business people to negotiate deals, scientists to share research findings, and activists to raise awareness about social issues.
  • Information Extraction: Gemini models can help researchers and analysts to extract and summarize relevant information from large and complex datasets, by processing text, images, audio, and video data. For example, Gemini models can help journalists to investigate news stories, analysts to predict market trends, and doctors to diagnose diseases.
  • Problem Solving: Gemini models can help people to solve complex problems and make informed decisions, by providing accurate and reliable information, and reasoning across different modalities and domains. For example, Gemini models can help engineers to design new products, lawyers to argue cases, and politicians to make policies.

These applications demonstrate the versatility and potential impact of Gemini models for various users and domains.

Limitations and Challenges

Gemini models also have some limitations and challenges that need to be addressed and mitigated, such as:

  • Factuality: Gemini models may generate outputs that are not factually accurate or reliable, especially when dealing with complex or ambiguous information. Gemini models may also generate outputs that are biased or offensive, especially when trained on biased or offensive data. To mitigate these issues, Gemini models are developed and evaluated using rigorous and diverse datasets and metrics, and are subject to human review and feedback.
  • Hallucination: Gemini models may generate outputs that are not coherent or meaningful, especially when dealing with rare or unseen data. Gemini models may also generate outputs that are unrealistic or inappropriate, especially when trained on unrealistic or inappropriate data. To mitigate these issues, Gemini models are trained and evaluated using diverse and realistic datasets and metrics, and are subject to human review and feedback.
  • Safety: Gemini models may generate outputs that are harmful or dangerous, especially when dealing with sensitive or confidential data. Gemini models may also generate outputs that violate privacy or security, especially when trained on private or sensitive data. To mitigate these issues, Gemini models are developed and deployed following a structured approach to impact assessment, model policies, evaluations, and mitigations, and are subject to legal and ethical compliance.
  • Ethics: Gemini models may generate outputs that perpetuate or amplify social biases or stereotypes, especially when trained on biased or stereotypical data. Gemini models may also generate outputs that violate cultural norms or values, especially when trained on data from different cultures or contexts. To mitigate these issues, Gemini models are developed and evaluated using diverse and representative datasets and metrics, and are subject to human review and feedback. Gemini models are also designed to be transparent and interpretable, so that users can understand how they work and what they learn from the data.

Gemini is a new model. Comparing to other large language models such as GPT-4, which have already been deployed and used in many applications. Given the prompts of LLMs can be quite different, Gemini will face competition from other LLMs, despite its state-of-the-art benchmark performance. However, Gemini has some unique features and advantages that distinguish it from other LLMs, such as its multimodal capabilities, its superior computing power, and its advanced training and optimization techniques. It will be interesting to see how Gemini will evolve and compete with other LLMs in the future, and how it will benefit humanity in various ways.



Sunday, November 05, 2023

The Next Evolution of AI: Introducing Self-Reflecting Multi-Agent RAG Systems

Artificial intelligence systems have seen remarkable advances in recent years through innovations like large language models and multi-modal architectures. However, critical weaknesses around reasoning, factual grounding, and self-improvement remain barriers to robust performance. An emerging approach combining collaborative multi-agent interaction, retrieval of external knowledge, and recursive self-reflection holds great promise for overcoming these limitations and ushering in a new evolution of AI capabilities.

Multi-agent architectures draw inspiration from the collaborative intelligence exhibited by human groups and natural systems. By dividing cognitive labor across specialized agents, strengths are amplified and limitations mitigated through symbiotic coordination. Retrieval augmented systems fuse this collective competence with fast access to external databases, texts, and real-time data - providing vital context missing within isolated models today.

Building recursive self-reflection into both individual agents and the system as a whole closes the loop, enabling continuous optimization of reasoning processes in response to insights gained through experience. This capacity for introspection and deliberative thinking allows the system to operate in a more System 2 thinking oriented manner.

For example, by assigning distinct agents specialized roles in critiquing assertions made by peer agents, the system essentially institutes deliberate verification procedures before accepting conclusions. Facts retrieved from external sources can validate or invalidate the logic applied, promoting evidence-based rational thinking. And recursive self-questioning provides a mechanism for critical analysis of internal beliefs and assumptions that overrides blindspots.

Together, these three mechanisms offer the foundations for AI systems capable of open-ended self-improvement through rigorous reasoning and abstraction. Peer critiquing across diverse agents identifies gaps or contradictions in thinking. Real-world data grounds beliefs in empirical facts. And recursive self-questioning pressure tests the integrity of all assertions made. Through such recurring examination, vulnerabilities are uncovered and addressed, incrementally strengthening the system's intelligence.

While substantial research is still needed to effectively integrate these capabilities, the promise is profound. Such an architecture could greatly augment AI aptitude for deep domain expertise, creative speculation, and nuanced judgment - hallmarks of advanced general intelligence. The result is systems exhibiting the flexible cognition, contextual reasoning, and capacity for lifelong learning characteristic of human minds but at machine scale.


Wednesday, November 01, 2023

FLOW Pattern: Comprehensive Knowledge Extraction with Large Language Models

Intent:

The FLOW pattern aims to enable the systematic extraction, organization, and synthesis of comprehensive knowledge from large documents or transcripts using large language models, such as GPT-3. It addresses the limitation of these models' context window, allowing them to handle large documents by chunking them up in the different stages of the pattern.

Motivation:

Large language models often have a limited context window, which restricts their ability to ingest and analyze complete large documents at once. The FLOW pattern seeks to overcome this limitation by breaking down the document into manageable chunks, enabling a more comprehensive analysis and synthesis of information.

Implementation:

  1. Find: Extract specific topics or perspectives from the document or transcript in manageable chunks, considering the limitation of the model's context window.
  2. Link: Synthesize the extracted content from various chunks of the document or transcript, ensuring coherence and connectivity between different elements, while noting that the information comes from different parts of the source material.
  3. Organize: Structure the linked content in a logical and coherent manner, facilitating a systematic analysis of the document's core concepts and themes using the model's iterative capabilities.
  4. Write: Expand and elaborate on the organized content, producing a well-developed document that is informative, engaging, and accessible to the target audience, leveraging the model's text generation capabilities.

Consequences:

By implementing the FLOW pattern with large language models, organizations can effectively overcome the limitations of the context window, enabling a more comprehensive analysis of large documents and transcripts. This, in turn, enhances the model's ability to provide meaningful insights and make informed decisions based on a more holistic understanding of the underlying information.

Example Use Case:

An organization utilizes the FLOW pattern with a large language model like GPT-3 to create comprehensive meeting minutes from a transcript, effectively overcoming the context window limitation by chunking the document into manageable sections for analysis and synthesis.

Summary:

The FLOW pattern, when implemented with large language models, serves as an effective approach for comprehensive knowledge extraction from large documents or transcripts. By addressing the context window limitation, it enables organizations to derive meaningful insights and make informed decisions based on a more comprehensive understanding of the underlying information.

Thursday, January 26, 2023

Transforming Document Exploration: Using GPT-3 to Create a Chatbot that Answers Your Questions

GPT-3 is the underlying model of ChatGPT. It is an advanced language model that can generate text but it can't find information that is not already in the model. In additon, there are limitation to feed long text to GPT-3. In this article, we will show you how to use a technique called conditional generation to build a chatbot that can help you explore long documents.

Conditional generation is a way of training a language model to understand the context of a document and generate answers to questions about it. We will take the following steps to build the chatbot:

  1. Create a corpus from the document.
  2. Match the user's question to the relavant passages in the corpus.
  3. Create a prompt that includes the context of the question.
  4. Send the prompt to GPT-3 to generate an answer.
  5. Create a function that combines steps 2-4.
  6. Test the chatbot by creating a question-answer loop.
  7. Use Gradio to create an interface for the chatbot.
In this exercise, we will build a chatbot to answer the question in an employee handbook.

Create a corpus from the document

First we have prepared a file HKIHRMEmployeeHandbook.txt which is text version of a sample employee handbook created by Hong Kong Institute of Human Resource Management.

To start, we are using a python library called "sentence-transformers" to create a corpus of embeddings from the text document "HKIHRMEmployeeHandbook.txt". The sentence-transformers library is an NLP library that makes use of pre-trained transformer-based models, like BERT, RoBERTa, DistilBERT, and, in this case, multi-qa-mpnet-base-cos-v1, to generate embeddings for each sentence in the corpus. This is useful for our chatbot application as it allows us to find the most relevant passage in our document for the user's query. This will help to provide more accurate and useful answers for the user.

Embeddings are a way to represent text in a numerical form that can be used for comparison and similarity calculation by the model. In this case, we are using sentence-transformers library to convert the text into embeddings. These embeddings are a compact and dense representation of the text, which can be used to compare and measure similarity between different texts.

There are several advantages of using embeddings instead of keyword search when building a chatbot for document exploration:
  • Semantic Similarity: Embeddings capture the semantic similarity between sentences, which allows for more accurate and relevant matches between the user's query and the passages in the corpus.
  • Handling synonyms and variations: Embeddings can handle synonyms and variations in the query, which are difficult to handle with traditional keyword search. For example, if the user asks "what are the benefits of working overtime?" and the document talks about "compensation for working beyond regular hours", the embeddings will be able to match the query with the relevant passage, even though the words in the query and document are different.
  • Handling context: Embeddings can also capture the context in which a word is used, which allows for more accurate matches between the query and the passages in the corpus. For example, "what are the benefits of working overtime?" and "what are the drawbacks of working overtime?" have similar keywords but opposite meaning, embeddings can help to understand the context and give the correct answer.
  • Scalability: Embeddings can handle large datasets and can be used to match queries against a large corpus of documents efficiently.
Overall, using embeddings instead of keyword search improves the accuracy and relevance of the chatbot's answers, and provides better user experience.

Match the user's question to the relavant passages in the corpus

Once we have the embeddings, we can find the most similar passages in the document to the given query. The util.semantic_search method to find the most similar passages in the corpus based on the query embedding and the corpus embeddings. The top_k variable controls how many passages to retrieve. After getting the passages from the corpus, they are joined as a whole piece together with linebreak to form the context. The result is as follows:

2.2 Working Hours 3.2 Overtime Compensation Prior approval must be obtained from respective supervisor for working overtime. Overtime for employees at grade 7-10 will be compensated by pay at the following rates: Category 1 a) For non-shift employees, Monday to Friday starting from an hour after normal working hour to 12:00 midnight and Saturday after 2:00 p.m. b) For shift employees, overtime worked beyond his shift 1.5 times hourly rate Overtime for employees at grade 4-6 will be compensated by way of time off. Such compensation leave will only be granted when their department’ s workload permits. Any overtime not compensated in the form of leave by the end of the calendar year will be paid in the following February at the rate of 1.5 times the employee's hourly rate. 3.3 Annual Bonus Employees who work overtime in the following conditions are entitled to claim for meal allowance: - overtime for 3 consecutive hours from Monday to Saturday; - overtime for 6 consecutive hours on Sunday or public holidays Employees who are required to report for duty outside office hours in case of emergency will be entitled to an emergency allowance and compensation leave 4.5 Compensation Leave or Time-off for Overtime Work Employees at grade 4-6 who have worked overtime will be compensated by way of time off. Such compensation leave will only be granted when their department’ s workload permits. Please also refer to “Overtime Compensation” under Section 3.2 of this handbook for details.

You can observed that all passages retrieved are more or less relavant to the overtime question.
 

Create a prompt that includes the context of the question

Then, we can creating a prompt template. The prompt will be created and send for GPT-3 to generate an answer for the query based on the context. The following prompt will be created

Context: 2.2 Working Hours 3.2 Overtime Compensation Prior approval must be obtained from respective supervisor for working overtime. Overtime for employees at grade 7-10 will be compensated by pay at the following rates: Category 1 a) For non-shift employees, Monday to Friday starting from an hour after normal working hour to 12:00 midnight and Saturday after 2:00 p.m. b) For shift employees, overtime worked beyond his shift 1.5 times hourly rate Overtime for employees at grade 4-6 will be compensated by way of time off. Such compensation leave will only be granted when their department’ s workload permits. Any overtime not compensated in the form of leave by the end of the calendar year will be paid in the following February at the rate of 1.5 times the employee's hourly rate. 3.3 Annual Bonus Employees who work overtime in the following conditions are entitled to claim for meal allowance: - overtime for 3 consecutive hours from Monday to Saturday; - overtime for 6 consecutive hours on Sunday or public holidays Employees who are required to report for duty outside office hours in case of emergency will be entitled to an emergency allowance and compensation leave 4.5 Compensation Leave or Time-off for Overtime Work Employees at grade 4-6 who have worked overtime will be compensated by way of time off. Such compensation leave will only be granted when their department’ s workload permits. Please also refer to “Overtime Compensation” under Section 3.2 of this handbook for details. Answer the following question: Q: what should I do if I worked overtime? A:

You can try a few other forms like:
  • Answer the following question. If the answer cannot be found in the context, write "I don't know"
  • Answer the following question. If the answer cannot be found in the context, rewrite the question that is more related to the context by starting with "Do you want to ask" and then elaborate the answer

Send the prompt to GPT-3 to generate an answer

The following code is using the OpenAI API to generate a response for the prompt using GPT-3. GPT-3 will come up with an answer that fits the context:

If you are an employee at grade 7-10, you should obtain prior approval from your supervisor and you will be compensated by pay at the rate of 1.5 times your hourly rate. If you are an employee at grade 4-6, you will be compensated by way of time off. Such compensation leave will only be granted when your department’s workload permits. Any overtime not compensated in the form of leave by the end of the calendar year will be paid in the following February at the rate of 1.5 times your hourly rate.

Create a function that wraps up everything

Wrapping the code into a function called answer(query) allows for easy reuse of the code and inputting of different queries. This function takes in a query string, and return the response of the API. This makes it easy to input different queries and get an answer without having to repeat the previous steps every time.

Test the chatbot by creating a question-answer loop

Here, we have an infinite loop that repeatedly prompts the user for a query, and exits the loop when the user enters "xxx". You can input multiple queries and get answers without having to restart the code. It allows easy testing of the function by allowing the user to input different queries and see the answers generated by GPT-3. The output will be as follows:

Q: do I get paid if I got sick? A: Yes, employees who are not able to report for duty due to illness are entitled to full pay sick leave, subject to a maximum of the lesser of 120 days or the number of sickness days that the employee has accumulated according to provisions in the Employment Ordinance. Additionally, employees who suffer from injury arising out of and in the course of employment are entitled to compensation in accordance with the Employee Compensation Ordinance. ====================================================================== Q: what should I do if I got sick? A: If I become ill, I should notify my immediate supervisor as soon as possible and provide a medical certificate if necessary. I am entitled to full pay sick leave, subject to a maximum of 120 days or the number of sickness days I have accumulated according to the Employment Ordinance. If I have to leave during office hours due to sickness or personal reasons, I must obtain prior approval from either my supervisor, department manager or the Human Resources Department. ====================================================================== Q: xxx

Use Gradio to create an interface for the chatbot

Finally, let's use gradio library to create a (GUI) for the answer(query) function defined earlier: Now you have a more user friendly GUI to ask the questions:
Try it out in Colab: Open In Colab