Secrets of NLU Systems Testing

This is where the importance of NLU testing – Natural language understanding – comes into play. NLU testing is a complex process, since it is impossible to accurately determine the boundary of testing completeness. For this purpose, there are some life hacks – testing rules, which I would like to discuss in this article. The materials are divided into two blocks: personal experience of a QA engineer and testing using neural networks.

Testing on personal experience

As a rule, when testing NLU, a QA engineer relies on a scenario (design) and has at the input: a topic, a list of questions/activator phrases to hit the topic, and the expected response of the bot (replica, transition, sending a letter, etc.).

Obviously, the user can ask the question in different ways, our task is to teach the bot to recognize as many of these variations as possible. For testing, you need to use your vocabulary as much as possible, and also try to “imagine” yourself in the role of another person: grandma Nadya from the neighboring front door, an influencer from a trendy coffee shop, or mechanic Valentin, who dilutes each sentence with obscene language.

When testing, you can rely on the following tips:

1) Select keywords in the user's intent and match them synonyms according to the topic.

Activate: open, issue, launch, turn on

Pass: pass, map, certificate, pass

2) Think about colloquial forms words. When communicating with a bot, users often modify words, replace them with colloquial variants or use borrowings, and abbreviations (quantity and others) are often found in text bots.

Check balance = View balance

Program = software

Money = cash

Help = give a hand

3) Sometimes it happens that some words it is impossible to find a synonym or it is difficult to come up with one that would not intersect with another topic. In such cases, you can use any available dictionary of synonyms.

4) Thinking about options for the user's request can help answer text. For example, in the answer “You can activate your corporate pass on our portal, at the information desk in the main hall or via quick access terminals” contains information about specific activation methods, and user questions may be related to them:

Is it possible to activate the card via a quick access terminal?

– Where in the main hall can I activate the card?

5) Create pool of speech turnshow the user can communicate:

– Getting straight to the point: Where… / How… / In what way…

– In business style: Tell me how…

– Expressing your need: I want… / I need…

– Describing the situation: Such situation…

– Asking for help: I can't understand… / Help me pay…

– Addressing someone: Who…/To whom…

– Waiting: Waiting…/ When…

– Asking about the possibility: Is it possible…

6) Don't forget to take into account various word/phrase recognition options. Voice bots work with the speech recognition system (ASR – Automatic Speech Recognition), there are different circumstances or interference on the line, in which the user's speech is not recognized accurately. During manual testing, variants of incorrect recognition are recorded and, if they are frequent or do not interfere with logic, they are added to the test.

Activate = format

Card = sweater

SMS = neighbor

7) Take into account typical defects that are often repeated in bots. For example, phrases such as “For now yes” or “I haven’t decided yet” can be associated with the topic by the bot Farewells by the keyword “Bye”. The refusal option “Yes, no, probably” or the phrase “we can find out” sometimes incorrectly directs to Agreement. When asked “Are you ready to take a short survey?” or “So, shall we start playing?” the bot often does not recognize “ready” or “start” as consent. Such and similar phrases need to be added to checks based on real user requests and their context.

8) Formulate user questions based on a combination of different variations from the previous points. About 15 different phrases.

It is worth saying that when collecting test phrases, it will not be possible to fully cover all cases. The language is so rich in words and word formations, neologisms that it is impossible to foresee them all. Do not despair if 20 synonyms were selected during testing, and the user came up with 21. You need to relax and calmly teach NLU to understand various non-trivial phrases.

Testing with neural networks

It is no secret that the breakthrough of the last two years is the release of Chat GPT and similar language models that are capable of generating a variety of thematic texts in less than a couple of seconds. For many areas, this opportunity has become a savior from routine tasks or inspiration in monotonous work. So why not take advantage of this chance and apply this technology in NLU testing.

We work with the tool Jay Copilotbecause there are several of the most popular neural networks to choose from, including Russian ones. It's no secret that, for example, YaGPT sometimes copes with generating texts in Russian better than ChatGPT.

Several methods have been tested on different cases to make life easier when testing NLU:

  • You can start with something simple and ask the neural network generate synonyms custom requests based on those in the scenario. The advantage of this approach is that you don't have to spend a lot of time coming up with a context for the request and you can quickly get an acceptable result.

Think of 5 synonymous phrases for the question «How to activate the pass?»

  • The reverse option, where on the basis available answer Can ask generate user queries.

Based on this answer «You can activate your corporate pass on our portal, through the information desk in the main hall or through quick access terminals» think of 5 questions

  • Give it to the neural network more context To show your imagination, ask her to imagine herself as someone else in a certain situation.

You are a new employee of a large company and now you are talking to a voice assistant on the phone. You need to activate the pass, but this is your first time in the office and you don’t know how it works. Write 5 questions that you will ask the bot to find out how to activate the pass. Make your lines in a conversational style, use interjections

When generating questions in a given context, you can also specify the style of speech, regional characteristics, use of jargon and borrowed words.

  • It is also worth remembering about disadvantages of the phrase generation approach. Not all models are good at generating conversational speech, and you have to try hard to achieve something more than just interjections. Also, the choice of model can play a significant role. For generating phrases in Russian, it is better to choose domestic models, they are trained on a larger number of Russian-language texts.

Test until you're blue in the face

In conclusion, I would like to say that testing natural language understanding is a complex and multifaceted process. When creating a test set of phrases, it is necessary to consider the context in which the bot will be used, for what purposes, business areas, the age of users, and take into account various language styles and speech patterns. Testers can use texts generated by neural networks as an additional tool or as an “outsider’s view of the situation” to support their expert assessment. It is important to consider that it will not be possible to test everything and to teach it to understand all user requests the first time either. There is no point in sitting around endlessly trying to fix everything. Each project has its own priority, deadlines, etc. The main thing is to bring the system to a certain good indicator, at which communication with it will become a joy, not a burden. As a “good indicator”, you can focus on the fact that the bot correctly processes script phrases, as well as the most frequently encountered synonyms.

Do you have any life hacks for testing bots? Or maybe you know some other non-trivial options for using neural networks in testing? Be sure to share in the comments!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *