Creation of an AI assistant that answers user questions based on the knowledge base

Hello! Olga Tatarinova, co-founder of the company, is in touch AGIMA AI. The times when a call to technical support for a service meant endless waiting on the line with music instead of beeps are passing. Artificial intelligence does not get tired, does not take breaks and, fortunately, does not play intrusive tunes.

In this article I will talk about the process of creating a virtual assistant for NL International. He can instantly answer frequently asked questions using the company's knowledge base. And it will not only search by keywords, but also find all thematically related documents, even if they do not contain relevant keywords.

For example, if a user asks, “I gained 3 kg during the holiday season. How can I get rid of them? — The AI ​​assistant accesses the FAQ database and suggests an article titled “How to Lose Weight.”

We will cover the following questions:

  • How to organize the knowledge base of your product or company.

  • How to teach a virtual assistant to retrieve information from this knowledge base and formulate answers to user questions.

  • How to manage costs: Use LLM and avoid paying a ton of money to OpenAI or other providers if you have a huge client base.

Ingredients

Technologies we will use:

  • Chatwoot – Open source operator interface and knowledge base.

  • Rasa is an open source framework for creating chatbots.

  • Botfront — a visual interface for creating chat bots on RASA.

  • Qdrant — a vector database for storing vector representations of articles from the knowledge base.

  • Datapipe — ETL, with the help of which we extract articles from Chatwoot, process them and place them in Qdrant.

Recipe

1. Content: Preparing a knowledge base

We love using Chatwoot in our projects. We usually use it for the operator interface when the chatbot switches to a human. But besides the operator interface, Chatwoot has a convenient knowledge base feature.

We've added this feature to the Chatwoot knowledge base: For each FAQ article, we include several examples of real questions that users ask if they want an answer from that article.

It is best to keep each article short and relevant to one topic as much as possible.

It is best to keep each article short and relevant to one topic as much as possible.

2. Programming: Converting all articles from the knowledge base into vector form

We will search for articles that answer the user's question on semantic similarity between texts. To do this, we first convert the texts into vector form and then calculate the distance between the vectors. The smaller the distance, the closer the content of the articles.

To store the vector representation of articles, we use the Qdrant vector database. Qdrant is optimized for vector operations, allowing you to quickly find similar vectors.

To convert the text of an article into a vector and write it to Qdrant, we need to solve two problems:

  1. Documents should be segmented so that each vector corresponds to one logical topic. This is important because the more text that is encoded, the more average and fuzzy the resulting vector becomes. Accordingly, it becomes more difficult to identify any theme in it. Therefore, it is important to initially segment the document into parts, and there is no universal solution. Typically, segmentation is performed using some structural heuristics (chapter or paragraph), then refined by models to predict next sentence (NSP), e.g. And ultimately verified by a person.

    In the context of the FAQ, this step was not needed since we only had short answers. However, to enrich the search field, we generated humanoid questions for the answers, and for the questions (if any) we created a synthetic “answer image”. All of this is then converted into a vector and added to the examples for the target article.

  1. We need to choose an efficient vector generation method. We used an encoder from OpenAI or the multilingual-e5 model. Both are effective because they are trained on parallel corpora of texts in multiple languages.

3. Programming: setting up the FAQ service

The FAQ service itself implements a simple API. The API receives the user's request, converts it into a vector, and performs a vector search in Qdrant. Returns the most relevant vectors along with article titles and texts.

4. Programming: setting up a chatbot assistant

We need a chatbot to receive questions from users and send answers.

To create a basic chatbot, we use Rasa, an open source framework, and Botfront, a visual interface.

When a user writes to a chatbot, RASA tries to determine the intent in the user's request. If the user's intent is to ask a FAQ question, RASA redirects the request to the FAQ service.

And the FAQ service returns a list of related articles.

5. Optional: free-form answers using LLM (using RAG, Retrieval-augmented Generation)

Once we have retrieved the most relevant articles from the knowledge base, we can ask the LLM to read the retrieved articles and generate an accurate answer to the user's question.

This approach has a major drawback: the best LLMs, such as GPT-4, are quite expensive, and if you have a large number of support requests, using an LLM can cost a large amount of money.

Our client had exactly this case, so we turned off the generation of answers, leaving only the answers with a list of articles from the knowledge base. This approach is not used by expensive LLMs for every request.

6. Programming: Keeping everything up to date

We have regular tasks that we need to complete to keep all the data up to date.

We must update the vector representation of articles in Qdrant if new articles appear or old ones change. To do this, we use the Datapipe ETL framework, which automatically tracks content updates, deletions, and additions. We run the ETL process every 15 minutes. And if any content in the knowledge base changes, Datapipe captures the changes and recalculates the vectors in Qdrant. Thus, new information becomes available to the chatbot 15 minutes after being added to the knowledge base.

We need to make sure that RASA correctly identifies the FAQ intent. When the chatbot is retrained, RASA captures the most diverse set of data to cover the entire search field with a minimum number of examples and adds it to training.

Project conclusions

As a result of the project, we have a Chatwoot fork that supports the operation of an AI assistant based on the Chatwoot knowledge base out of the box, without additional development.

If you're using Chatwoot in your projects, especially without chatbot automation, it might be worth migrating to our Chatwoot fork to enable AI assistant functionality.

Statistics

As a result of the implementation, the number of support requests processed by the chatbot increased from 30% to 70%. The content team continues to add articles so that the chatbot can handle more and more queries.

Acknowledgments

Many thanks to the team NL International for their trust and the opportunity to develop an AI assistant for them – we named it Nellie. You can talk to Nellie yourself on the company website.

PS Special thanks to the project team: Maria Rodina, Rustam Karimov, Anton Grechkin, Sergei Serov. Our side was led by the CEO/CTO Epoch8/AGIMA.AI Andrey Tatarinov.

What else to read

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *