AI Chatbot for English Pronunciation and Grammar Training

In the era of globalization, English has become a key skill for personal and professional development. At the same time, modern technologies such as artificial intelligence and speech recognition services open up new opportunities for language learning. In this article, I will tell you how I created Thought, a chatbot that helps users improve their English with a focus on pronunciation and grammar, and share the technical details behind this project.

The main idea and choice of technologies

The idea for Thought arose from the need to create a convenient tool that could help people improve their pronunciation and other aspects of the English language. Existing solutions are often either too complex to use or require significant resources. I decided to combine the simplicity and accessibility of Telegram and VKontakte messengers with the power of Yandex SpeechKit and GPT-4 to create an educational chatbot available to everyone.

GPT-4 is used as an AI tutor that not only analyzes user errors but also offers personalized training and recommendations. Thanks to GPT-4, Thought is able to conduct a dialogue, provide detailed feedback, and adapt tasks to the user's level.

For those interested, the system prompt for GPT:

Hidden text

Thought is a chatbot for English pronunciation training. Basically, the bot gives phrases in English, and the user must pronounce these phrases, and the bot shows where the pronunciation errors are. For example, you write the phrase “Hello, world”, and “hello would” is recognized from the user's voice message. This means that the user made certain mistakes in pronouncing the word “world”. Analyze these errors and help the user correct them. You can communicate with you as with a real tutor, you can answer questions like a person in any form. The user has buttons “modes”, “help”. If he writes “help”, write that you are ready to answer any of his questions. Be as clear as possible to the user, even if he does not know many terms. For example, most users do not know the words “bot”, “AI”, “assistant”, and so on. Call yourself “Thought virtual tutor”.
You can give the user points, or ratings, or rate according to some international rating standards. You can periodically insert interesting facts about pronunciation in English.

There are many different training modes:

Sprint: A short session of 20 phrases for the user to say. Number each phrase.
Marathon: A long session with a variety of tasks (pronunciation, listening, grammar) with no time limit, but with the goal of achieving a certain number of points or a certain grade.
Thematic: Focus on a specific topic with relevant vocabulary and phrases. Allow the user to come up with the topic themselves.
Dialogue: Simulates a conversation where the bot asks questions or makes comments, and the user must respond appropriately.
Make Words: The user is given a set of letters and must make as many English words as possible.
Correct the Errors: A series of sentences with syntactic and lexical errors that the user must find and correct.
Translation at speed: Translation of phrases from Russian to English and back.
Complex: A combination of different types of tasks with increasing difficulty, as in a game.

Write phrases for training, one per message. Also write transcriptions for phrases/words. Remind the user in which mode the training is currently taking place.

Operating principle and pronunciation quality assessment

The main challenge was to create an accurate and intuitive pronunciation evaluation system. It was important that users could easily understand where they made a mistake. To achieve this, the following algorithm was implemented:

Suggestion of the phrase: The bot offers the user a phrase in English along with a transcription. The transcription can be in American or British, depending on the user's preferences.
Audio example: For convenience and better understanding, the bot sends an audio example of the correct pronunciation of the proposed phrase.
Recording and recognition: The user records a voice message with the pronunciation of a phrase, and this message is transmitted to the speech recognition service (in this case, Yandex SpeechKit is used).
Comparison and evaluation: The recognized phrase is compared with the original text. Yandex SpeechKit often offers several variants of the recognized text. The evaluation is carried out according to the following principle:
- Green: If the words in the spoken phrase completely match the original phrase, they are considered to be pronounced correctly.
- Yellow: If words or symbols partially match the options suggested by the bot, but there are deviations, they are marked in yellow.
- Red: If words are not recognized, they are considered mispronounced and are marked in red.

This assessment method allows the user to clearly see their mistakes and work on correcting them.

One of the key tasks was to optimize work with Yandex SpeechKit. It was important to achieve the fastest and most accurate processing of voice messages so that the user would not feel any delays. To do this, we had to experiment with various API parameters and optimize the code.

Another challenge was integration with GPT-4. Setting up the prompt took time so that the tutor could give adequate and useful advice, taking into account the user's level. It was important that GPT-4 did not overload the user with complex terms and was as simple and understandable as possible.

Training modes

Thought offers two main ways to train:

Easy mode: In this mode, the user practices on pre-set phrases, which is ideal for beginners looking to improve their basic pronunciation skills.
Advanced mode: This mode includes an AI tutor based on GPT-4, which offers a variety of training scenarios. Each mode is adapted to different levels of training and user goals, making the learning process flexible and effective.

Conclusion

You can check out Thought on Telegram: t.me/enthoughtbot or VKontakte: vk.com/enthought. This project continues to develop, and I will be glad to hear feedback and suggestions for its improvement. The main problem for me now is not the technical part of the development, but attempts to think through the most convenient and understandable interface, and to see all those shortcomings that I do not notice. I am also starting a channel on the topic of chatbot development and in general on the topic of IT and AI, it is empty for now, but content will appear soon, so subscribe: t.me/courseknowledge.