A New Approach for Text Classification in Chatbots

More and more often, projects are implementing the need to classify incoming text for further processing. Before the recent boom in neural networks, text classification tasks were quite labor-intensive, expensive, and required deep knowledge of NLP. And ready-made solutions did not provide the desired accuracy of answers. Fortunately, now you can implement many solutions almost instantly. Imagine a car dealer receives hundreds of messages from customers every day. How to quickly and accurately determine what the customer wants? With text classification.


Hello everyone, my name is Ivan Chetverikov, I am an AI architect at Raft, we are implementing advanced AI technologies into the processes of many companies in many industries. I will share our experience in implementing a new approach to classifying text in chatbots. Let's compare three approaches to implementing a classifier: a classic ML classifier, an LLM classifier, and a library semantic-routerbut before that, let's go a little into the context of the project.

Selecting a classifier

First of all, let's decide which classifier to choose in order to launch MVP as soon as possible. To do this, we conducted a small analysis and found out that we have about 20 different topics into which user requests need to be classified. You might think that an ML classifier is great for static topics, but its development can take several months. Therefore, we decided to launch MVP with an LLM classifier, and after its development, move to an ML classifier.

Advantages of the LLM classifier:

  1. No need for dataset: To develop an LLM classifier, you don't need to collect and label a large amount of data in advance. You just need to write a prompt, which significantly speeds up the process.

  2. Rapid development: The time for developing an LLM classifier is significantly shorter compared to a classic ML classifier. This allows you to launch an MVP and start testing faster.

Disadvantages of the LLM classifier:

  1. Limited accuracy: Without a pre-trained dataset, the accuracy of the LLM classifier may be lower than that of the ML classifier.

  2. Cost for a large flow of messages: the use of the LLM classifier can be high for a highly loaded service, as a result of which the token consumption for cloud models will offset all the benefits of its use. Although in this case, you can deploy a local LLM.

Advantages of ML classifier:

  1. High precision: Given a well-labeled dataset, an ML classifier can provide high classification accuracy.

  2. Autonomy: The ML classifier can be deployed on-prem, eliminating dependence on third-party services.

Disadvantages of ML classifier:

  1. Long development time: Developing and training an ML classifier requires significant time and resource costs.

  2. Dataset requirement: Training an ML classifier requires a large amount of labeled data, which can be a labor-intensive process.

  3. Long update processes: Updating the model takes time and requires the involvement of an ML engineer.

I promised to compare three approaches to classifier implementation, but at the current stage of implementation, the presence of semantic-router was not yet implied. For now, parallel to the LLM classifier, ML development has begun.

Step 1: LLM classifier

When implementing the LLM classifier, it was intended as a temporary solution. Therefore, it was decided not to implement all topics, but to limit ourselves to the following set:

  • Search for information about a car by VIN number;

  • Car selection;

  • Information from the company's knowledge base (credit, trade-in, installment plan, etc.);

  • Information about office work;

  • Questions about switching to an operator.

This was done to speed up the development process and reduce the size of the prompt. The prompt was written using the few-shot pattern (a method in which the neural network is provided with only a few examples), the final size varied from 800 to 1000 tokens for the target model. gpt-3.5-turboThe cost of one request was approximately $0.002 – $0.003.

Based on these data, it was difficult to compare the LLM classifier with the ML and semantic-router classifiers, but we still tried.

Its main advantage is the speed of implementation. It took us no more than one or two weeks from the start of the project to the creation of a working DEV stand. For PoC or MVP stages, this is quite fast.

To improve accuracy, we used Few-Shot prompting with 3-4 examples for each topic. The classifier can also serve as a good layer to protect against prompt injections, as it discards all messages that are not related to the target ones.

The part of the prompt describing the categories, excluding business rules, incoming text processing rules, and response format, looks like this:

"""
You need to classify user input and return the best category that it match.
You must return exact name of the most appropriated category.
Only these categories are available:
"car-search-by-vin", "cars-search", "info", "office-info", "switch-to-operator"
Rules for categories:
Category "car-search-by-vin" - Any description
Category "cars-search" - Any description
Category "info" - Any description
Category "office-info" - Any description
Category "switch-to-operator" - Any
Examples of user input:
Ex. 1
Question: what's the mileage? how much does it cost?
VIN number: XXXXXXXXXXXXXXXXX
Expected category: "car-search-by-vin"
Ex. 2
Question: Any trade-in options?
Expected category: "info"
Ex. 3
Question: Need to discuss personal discount and car customization
Expected category: "switch-to-operator"
"""

Of the metrics for the LLM classifier, we only calculated the accuracy of answering questions (precision), which, by the way, was quite good given the low cost of the model and was 75-80%. It is also worth noting that for all questions that the bot could not process, the system automatically switched to the operator.

If necessary, you can work with a local LLM by deploying the inference of any open-source model on the server, but it is worth considering that to achieve greater accuracy, you need to involve ML engineers and invest more effort in developing the prompt. This approach will be relevant if the project already has a locally deployed LLM or it will be used not only for classifying text data.

Step 2: ML classifier

As stated, machine learning models are often better suited for classification tasks than large language models when there is a training dataset.

At this stage of the project, we had a prepared dataset and defined model (distilbert/distilbert-base-multilingual-cased) which we will train.

Bert is Google's state-of-the-art model for text classification. It is also multilingual, meaning it is suitable for Russian and has already been trained on it. Distillation means the model takes up less space than its non-distilled version. It learns faster with minimal loss in quality.

At this stage, there were not 5 topics, as before, but all 20 full ones. The marked dataset was collected in an Excel file. For clarity, examples of some of the topics are given below:

clean_text

topic_name

topic

How can I contact you?

Feedback

1

is the car available?

Cars in stock

2

is the car sold for cash?

Payment terms

3

Can I get it on credit?

Credit

4

2,500,000 ready to take

Bargaining and discussing the price

6

Are there any painted parts?

Condition of the car

7

Can you send a report on the car?

Request diagnostics

10

Can I see the wine?

VIN Request

19

After training on the prepared and filtered dataset, the following result was obtained. The report was generated based on the scikit-learn library and the prepared test dataset:

ML model results

ML model results

After implementing the model in the service, the response results became better than when using the LLM model. But we wanted to achieve a greater result in the accuracy of determination to completely relieve the operator of standard customer questions.

Step 3: Semantic-router based classifier

semantic-router results

semantic-router results

So, we came to the point where it became clear that the process of updating the ML model was quite long and labor-intensive. This significantly slowed down all the processes of work on the project. Therefore, we decided to look for an alternative. Solutions based on pytorch or nltk were not suitable for us because they created a large load during the application deployment process due to the deployment of the local model, and also created a large load on the CPU. The most suitable of the existing solutions seemed to be semantic-router. Since our base was Python and integration with OpenAI was already implemented.

The semantic-router library is a “decision layer”, i.e. it classifies text into topics. It works on the basis of LLM and Embedding models with support for local models, as well as cloud ones: OpenAI and Cohere.

To implement the bot version using semantic-router, the same dataset was used as for the ML classifier. To store embeddings, OpenAI and the text-embedding-3-large model were used (the cost is $0.130 / 1M tokens).

The logic for running the classifier:

  1. Optimization of the original dataset.

    Here we used the K-means method implemented in the scikitlearn library. The best result was achieved with the following parameters: data batch size = 1000, the number of final records per class = 300, score threshold = 0.1.

  2. Loading the optimized dataset into embeddings and creating a class for our classifier.

    At this stage we need to create a list of routes for our embeddings. For this we used semantic_router.Route . These routes then turn into semantic_router.layer.RouteLayer .

An example for creating a semantic-router classifier is uploaded to github, you can use it read here.

When running the application, the cost of loading a dataset into the embedding model is approximately $0.003 per 5,000-record dataset (29,043 tokens). One query for classification costs less than 50 tokens ($0.65 per ~10k queries).

The accuracy of responses (precision) using this approach was initially 89%. After that, manual optimization of the dataset for embeddings was performed, within which we added more examples for some topics, the accuracy of which in chats with clients did not satisfy us. After optimization, the overall accuracy began to reach 92-96%.

The resulting report from scikit-learn looks like this:

Results semantic-ro

semantic-router results

Security of solutions based on classifiers

Additionally, classifiers implemented using the listed methods have different resistance to attacks. But this is a topic for a separate article. Now we are talking about the implementation of these classifiers, so I can only give examples of several main types of attacks:

Results

Of course, for this case of defining topics for a car dealer, semantic-router showed the best result. However, it is worth noting that to achieve high accuracy, it is also necessary to manually mark the dataset and select parameters for converting data into embeddings.

Based on the experience gained, we can make some small recommendations:

  • LLM classifier It is worth using in cases where there is no prepared dataset, and you also want to launch the project as soon as possible. It is worth noting that for a large number of messages, this classifier can be quite expensive. When using cloud models, latency will also be higher than other solutions.

  • ML classifier suitable for on-prem deployment or based on the ban on using any third-party services on the project (like OpenAI in our example). Requires a labeled dataset, as well as additional resources on the server. Accordingly, all messages will be processed at no additional cost. Latency depends on the resources allocated to the classifier.

  • Classifier based on semantic-router also requires a labeled dataset, but its deployment can be done much faster, since the model does not need to be trained, but only put the data into embeddings and wrapped in the functionality of this framework. The cost of this solution will be low, which allows you to increase the throughput and not pay for the LLM classifier. Latency in this case will be closer to the ML classifier than to LLM, since access to the Embedding model is much faster.

Cost of solutions:

  • LLM classifier: on OpenAI models gpt-3.5-turbo-1106 0.002$ – 0.003$ for one request with a prompt of 1000 tokens, 20-30$ for 10k requests.

  • ML classifier: 0$ because it works on the server's capacity.

  • semantic-router classifier: on OpenaI models text-embedding-3-large $0.003 per app launch and ~$0.65 per 10k requests.

I would also like to add that Yandex recently released a lightweight model trainable classifierand in the near future we want to test it and compare it with the results already obtained.

Text classification is a powerful tool that can significantly improve customer service and optimize processes. Try one of the approaches in your projects and share your results in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *