Linguistic features of human speech in dialogues with a virtual assistant

Often, to create virtual assistants approaches based on machine learning and of course, rule-based approaches. Both (mostly machine learning) rely on input, which is usually human dialogue. Wherein, the factor that users of dialogue systems will not communicate with them in the same way as with real people is not taken into account.

In this article, I will consider our experiment on the statistical analysis of the text of dialogs of the form man-man and human chatbot. Dialogues were conducted on the topic of frequently asked questions (FAQ) on the topic of COVID-19 in Russian. The results were processed using methods of numerical analysis of texts (groups of metrics such as: readability, syntactic complexity, lexical diversity). Statistical analysis showed significant differences in the above metrics. In particular, in dialogues with chatbots, respondents used shorter words and sentences, as well as simpler syntax. Moreover, lexical diversity and readability indices were an order of magnitude higher in human-to-human dialogues – all this allows us to conclude that people use simpler language when communicating with virtual assistants. In this regard, developers should pay special attention to preparing the initial data for creating such applications.


It is no secret that today virtual assistants (in particular chatbots) have already been actively introduced into our lives. They are following us in bankson government sites, and just when we want have fun. And let me believe that Gartner’s forecast the fact that by 2022 70% of white-collar workers will communicate with chatbots on a daily basis has not fully come true, anyway, the trend towards their introduction into our lives is more than noticeable. In my opinion, technology such as virtual assistants neutral. Depending on whose hands it is in, the degree of its usefulness or, on the contrary, hostility can change. In this article, I propose to focus on the useful side of virtual assistants, namely the ability to quickly and efficiently get an answer to a particular question.

As an example, it is proposed to consider a FAQ chatbot that answers user questions on the topic of COVID-19. Dialogue data collection took place from March 2021 to July 2021 i.e. during one of the peak moments of the pandemic. The data collection process was designed in such a way that the analysis of two different types of dialogues (human-to-human and human-to-chatbot) could be carried out practically “ceteris paribus”. More details in the next chapter.

Creating a chatbot

First of all, it was necessary to collect a knowledge base of the question-answer format. The data for the knowledge base was taken from open and official sources of information, for example, such as the website stopcoronavirus.rf. The questions were manually rephrased so that the appropriate classification model could be trained. As a technical solution, a platform was used Google Dialog Flow, and its intent classification module. Chatbot interface was messenger Telegram. In the figure below, you can see the conceptual architecture of the chatbot.

Chatbot FAQ Conceptual Architecture
Chatbot FAQ Conceptual Architecture

Experiment planning

To conduct an experiment we divided the respondents into Group No. 1 (human-to-human dialogues) and Group No. 2 (human-to-chatbot dialogues). Let’s start with the second group, because everything is simple there – respondents are invited to conduct a dialogue directly with the chatbot. At the same time, an introduction is given on the subject of the dialogue and possible questions. In the first case, everything is similar, except that respondents conduct dialogues with a human expert, not knowing that he uses the same chatbot to answer the question. In other words, the expert is proxy between the respondent and the chatbot. This type of experiment is called inverted Wizard of Oz. A description of the original Wizard of Oz can be found here. The figure below shows the scheme of the experiment from our original article.

Scheme of the experiment (in English)
Scheme of the experiment (in English)

This experiment was conducted from March 2021 to July 2021. As a result, 35 dialogues were received for Group No. 1 (human-to-human) and 68 dialogues for Group No. 2 (human-chatbot). The obtained data were anonymized and used in further analysis. Source code and data are in Github repositories. In the next chapter, I will provide detailed statistics on measurable metrics.

Results of the analysis of text dialogues

For the analysis of text dialogues, a number of metrics were selected for the following groups:

  • Descriptive statistics (average word length, average number of syllables, etc.);

  • Readability (Flex reading ease etc.);

  • Syntactic complexity (Mean dependency distance);

  • Lexical diversity (Type Token Ratio, Lexical Density, etc.).

Metrics were calculated using the Python library LinguaF. Description and methods for calculating metrics can be peeped in our original article. Below is a table with the calculated average values ​​of the metrics, as well as the values t-testallowing you to determine whether the differences are the same metric on different types of dialogs (Group No. 1 and No. 2) statistically significant (significance level α=0.01).

Tables with calculated values ​​of metrics showing the linguistic difference of texts
Tables with calculated values ​​of metrics showing the linguistic difference of texts

It’s obvious that on most metrics we have a statistically significant difference. Let’s start with simple ones. Metric No. 1 – the average number of words in the sentences of the dialogues of group No. 1 is almost 2 times higher than the similar value for group No. 2. This is also true for metric #2, average sentence length. Average word length and number of syllables per word are also slightly higher for group #1.

Syntactic complexity metric (Mean Dependency Distance), which shows the average distance between dependent words in a sentence, shows that for human-to-human dialogues, such a distance is much greater. This obviously means a more complex sentence structure in the case of group #1.

Differences in the values ​​of metrics #6 and #9 are not statistically significant. However, others metrics of lexical diversity clearly show the lag in the dialogues of group No. 2. All readability metrics showed a statistically significant difference, which illustrates the more complex structure of the group #1 dialogue texts.

Conclusions and conclusion

In this work, we numerically confirmed that a person in a dialogue with a chatbot tends to use simplified linguistic constructions. This applies to both individual words and the structure of sentences as a whole. The fact of using a simplified language is an impulse for a more careful collection of the initial data used to create chatbots.

Often, machine learning approaches and, of course, rule-based approaches are used to create virtual assistants. Both approaches (mostly machine learning) rely on input, which is usually human dialogue. At the same time, the factor that users of dialogue systems will not communicate with them in the same way as with real people is not taken into account. We recommend that all developers keep this factor in mind and make appropriate changes to the data and algorithms that underlie your systems.

For citation:

  title={Linguistic Difference of Human-Human and Human-Chatbot Dialogues about COVID-19 in the Russian Language},
  author={Perevalov, Aleksandr and Vysokov, Aleksandr and Both, Andreas},
  journal={Applied Innovations in IT},

Similar Posts

Leave a Reply