Use Case for the implementation of a smart chatbot based on the “Knowledge Map” approach and LLM GigaChat

Hello, Habr! My name is Alexander Suleikin, Big Data solutions architect, PhD and CEO of the IT company “DUK Technologies”. Together with our LLM implementation expert, Anatoly Lapkov, we have prepared an article on the topic of implementing a smart assistant in a large non-profit organization. Under the hood is the basic model from Sber GigaChat, but the entire binding and approach to solving the problem are our own. And this is what will be discussed in the article.

The original problem

One of the main problems with using LLM is the hallucinations that appear as a result of the model's incorrect interpretation of certain queries. One of the main reasons is the splitting of the source text into chunks, which is often done with errors or inaccuracies due to various reasons. On this topic and in more detail about the process of splitting into chunks and the features of the process, you can read, for example, in this article: https://habr.com/ru/articles/779526/. Here we will only note that the process is currently difficult to manage when it is necessary to increase the accuracy of searching for the most relevant vectors in the vector base.

The latest trends for splitting into chunks have been to use the same LLMs – more details on methods for splitting text into chunks can be found, for example, here: https://dzen.ru/a/Zj2O4Q5c_2j-id1H.

However, despite all the current achievements in the topic of cutting chunks, the problem of quality of information search in them still remains. Many areas of knowledge, including technical support assistants for users for any field – require more high-quality and accurate model answers.

About smart assistants

Despite the relative novelty of the topic and technologies, generative AI is gaining more and more popularity, and more and more real successful implementations of LLM in industrial tasks are gradually being found. One of these “standard” tasks, which seems to be one of the most promising for LLM solutions, is a smart technical support assistant, a digital employee who will be able not only to understand and answer user questions, but also to fully automatically create requests in accounting systems and service desks, maintain a basic dialogue with the client, enrich its knowledge base, and gradually learn more.

The task of automating technical support processes, regulations for answering typical questions can be quite well templated, at least within the same subject area. Many tasks are truly typical, and the potential for developing “smart” assistants / technical support employees for LLM is quite good. Moreover, we can talk about a very serious level of LLM development for solving such problems in the near future, especially given the rapid development of both basic models (OpenAI, YandexGPT, GigaChat, etc.) and new methods, solutions, frameworks, approaches and techniques for using large basic models to solve highly specialized business problems.

Knowledge Map Approach

A “knowledge map” is a RAG in which we, the developers, strictly control the information used to generate a response to a query. This control is achieved through the use of two techniques:

Identifying query classes and developing a query classifier.
Extracting and structuring data from source documentation.

For example, if we are developing a bot for technical support of some service, one of the request classes could be registration in the service, such as “How to get an account?”, “How to register in the system?”, etc.

To answer the queries of this class, we need the part of the documentation that describes the registration process.

We are developing a module that, using LLM, extracts instructions from customer documentation and stores these instructions in a convenient and accessible form, for example, in a database.

After this, we develop a query classifier, in which, using LLM, we determine the class of the incoming query, and if the query class belongs to the registration class in the system, then using a simple and unambiguous algorithm we extract the previously prepared instruction and pass it to LLM to generate a response.

As a result, we get a system in which hallucinations are excluded.

The system only answers questions for which it has information. Moreover, unlike RAGs built on vector databases, LLM requires precise and unambiguous information to generate an answer.

A side effect of this approach is the presence of clear boundaries of the system's authority, and in the event of a request that goes beyond these boundaries, the system will clearly signal this – a stub will be used, switching to an operator. Therefore, this approach is applicable only in cases where there is a clear limited list of topics where the accuracy of answers is very important and image losses are high in the event of incorrect answers from the assistant.

This approach also allows integrating the system with CRM, ERP, service desks, etc. If we know exactly what the request is about (its class) when processing a request, we know how to prepare a response to it. For example, when asking “What is the status of my request?” (for simplicity, let’s agree that a user can only have one request at a time), we can connect to the CRM, get the status of the request, and generate an accurate response. In this case, we do not develop an extractor for this class of requests, but we develop a connector to the accounting system (or take a ready-made one) and extract data for the response “on the go”.

Case Study: How We Implemented a Smart Assistant Based on the “Knowledge Map” of the User Support Area and the GigaChat Model

Description of the task

The task was set as simply as possible – to process customer requests to technical support.

Project limitations

The project had several limitations that greatly impacted both the solution architecture and some conceptual aspects of the implementation:

The customer's wish was for the base model to be domestic and not to use Open-Source models. The choice, in fact, was made between YandexGPT and GigaChat.
Very short project implementation period – 1 month. Development was done non-stop, while most of the real documents (user questions and answers over several years of technical support operation) were received a week before the project was delivered.
Several topics that were strictly forbidden to cover: handling questions about politics, and the regulations also included new regions of the Russian Federation (with which GigaChat worked very poorly, at least the censored version)
It was important to process requests qualitatively, the issue of image risks was quite high. It was necessary to forward the request to the operator in case of the slightest uncertainty in the answer

Implementation issues

The key problem of the project turned out to be the timing of the information provided by the customer.

While the customer was preparing the data, we developed the system framework. We prepared the infrastructure and methodology for testing the assistant.

As soon as the customer provided the data, we distributed the request classes among the developers and, working in parallel, quickly taught the bot to work in the customer's technical support service.

Another problem is GigaChat's different answers to the same questions, difficulties with disabling censorship, and problems with answers to certain topics (e.g. military-political).

Architecture

The general architecture diagram of the solution based on the approach we have implemented looks like this:

The architecture looks quite simple, the main labor costs go, in addition to developing the project framework, to processing each topic. However, this approach allows for the most accurate processing of incoming requests, classification and finding relevant pieces of structured text in the knowledge base.

The process of extracting and indexing text from incoming documents is a separate process that is done independently. For this project, the OpenAI ChatGPT model was used due to the tight implementation deadlines, but the extractor is also planned to be transferred to GigaChat (we expect a lot of interesting things here!) and the implementation of an interface for uploading new documents and reindexing (admin panel).

results

As a result, we got a bot that answers the most frequent requests with an accuracy of more than 90%. The bot has strict authority limits and in the event of a request that the bot should not answer, it forwards the request to a technical support employee. The resulting bot is not able to come up with anything on its own and always works strictly according to the instructions embedded in it.

What's next?

We are planning to further improve our solution – we already have some developments on the “Smart Search” framework for documentation with a Russified system interface. We are planning to build in the “Knowledge Map” approach there, and in other cases, when nothing was found and the “Knowledge Map” on the topic has not yet been formed, the standard approach on vectors from the system box will be used (better than nothing). So, we are working towards a hybrid search. So far, this option seems to us the most optimal at this stage, but we predict the improvement and automation of the “Knowledge Map” approach over time, at least for a specific area of knowledge.

We will be glad to provide this system for pilot use – write to PM if you are interested.

The general scheme for developing an assistant based on the “Knowledge Map” seems to us to be as follows: