How Generative AI Can Help IT Interview Preparation

Image generated by AI

Image generated by AI

Hi all! My name is Alexander Meshkov, I am the author of the free platform TestGrow for testers. Recently, I began to devote quite a lot of time to generative artificial intelligence and the possibilities of its application in various aspects of learning.

With the advent of Large Linguistic Models (LLMs), the IT industry has begun to change rapidly. Already, many IT specialists and companies are using various AI-based tools to perform routine tasks. In addition, the constant development and training of new models by big players (Google-Gemini, Microsoft-OpenAI and Meta-Llama) leads to the fact that the generation of answers based on neural networks is becoming more and more qualitative, therefore, no matter how rough it may sound , in the long term, I expect significant changes and integration of AI into many aspects of work.

There is a lot to be said about AI in general, but in this article I would like to share with you my new project in the field of AI for interviews and dive you a little into the technical aspects of implementing a voice interviewer.

Now my project has already been launched, it will be free (until requests start costing me significant money) and you can try to pass the training interview yourself. Bot available follow the link.

Well, then I want to tell you about the technical features of its implementation.

So, I think many of you who regularly use LLM models, both in everyday life and in your work, know that the same ChatGPT, with proper instructions, is capable of not only giving you various content, but also acting as a teacher or appraiser. Taking your specific evaluation criteria as input, LLM is able, using these criteria, to evaluate the input that the model must process. In order to organize a training interview with AI, it was necessary to divide the LLM work into two parts:

  1. Get a list of questions that is complete and corresponds to the position for which the person wants to be interviewed

  2. Give feedback on each answer to a question after receiving a response from the person.

In general, the task looks quite simple. Therefore, the first step was to understand which model was best to use to interact with AI at each stage. Why is this important? When working with generative artificial intelligence, it is worth taking into account 3 components: response speed, cost of using the model (in tokens), and response quality. You always need to find a certain balance that, at minimal cost, would allow you to get an answer quickly enough, which would also be of sufficient quality. You can always look at the most popular models at HuggingFace. At the time of writing, in the category of quality of answers in Russian (we live in the Russian Federation after all) there were the following models (honestly, I have not yet tried or studied the models from Yandex YandexGPT and from Sber Gigachat from Sber, so if anyone has that is, experience of use, share in the comments):

A screenshot of a computer Description automatically generated

A screenshot of a computer Description automatically generated

Of the three most developed LLM models, I chose OpenAI models because:

  • Llama In general, as an analytical model, it is still rather weak and its deployment on internal capacities with a large number of requests can cost a pretty penny.

  • Gemini – quite a good competitor to OpenAI in terms of the quality of response generation, but paid Google services voice-to-text and text-to-voice, which charge $0.024 per minute of processing, look a little expensive for a free application, when OpenAI offers a cost 4 times lower, 0.006 dollar per minute of processing.

  • OpenAI – in general, I have been working with this model for quite a long time and I like the quality of the generation of responses from their models, as well as the availability of documentation for the developer, plus a convenient API and libraries for voice-to-text and text-to-voice. There is no need to reinvent the wheel, OpenAI has already taken care of everything for us, which is undoubtedly very convenient.

Comparison of gpt-4o, gpt-4o mini, o1 preview models.

So, if you approach the choice of models, then when creating an application, if you do not have a large budget behind you, it is worth thinking about the rationality of using the latest and most expensive models to solve all problems, so we had to initially do small tests in order to determine the at what stages of the interview which model is better to use, from the point of view of the same criteria, quality of answer, cost of tokens and speed.

From the point of view of cost and response speed, the new o1-preview model was immediately eliminated, because the chain of thoughts technology used in it takes a long time to process received requests and the time to receive a response can reach up to 30-60 seconds. Plus the cost of using the model is 10 times higher than the 4o model – $15.00 / 1M input tokens and $60.00 / 1M output tokens.

Next, the choice was between the gpt-4o and gpt-4o mini models. The advantages of gpt-4o mini are that it is half the price of the regular 4o model, that is, $0.150 / 1M input tokens and $0.600 / 1M output tokens in gpt-4o mini versus $2.50 / 1M input tokens and $10.00 / 1M output tokens in gpt -4o.

Hidden text

A little offtopic.

By the way, one of the latest features from OpenAI, just last week, they announced that they are introducing prompt caching capabilities, which allows them to further reduce the cost of using their models by half. Details about prompt caching can be found here. https://platform.openai.com/docs/guides/prompt-caching

In general, I decided to test these two models a little and share the results with you. Below are examples of two requests that show the quality of responses from these two models for generating questions and providing feedback on the user’s answer (ps.s. for testing I used simpler prompts).

A screenshot of a cell phone Description automatically generated

gpt-4o mini

A screenshot of a cell phone Description automatically generated

gpt-4o

What conclusions can be drawn? GPT-4o generates more consistent questions, which are also built in according to the level of difficulty, and also include practical tasks, unlike gpt-4o mini, which prepared questions mainly only in the theory part, but in general when changing the prompt and adding certain conditions, this model is also capable of coping with this task quite well.

Next is a test to parse a person’s response:

A screenshot of a cell phone Description automatically generated

gpt-4o mini

A screenshot of a cell phone Description automatically generated

gpt-4o

Here the difference already appears – there is a significant difference in the answer. In my request, I specifically did not indicate the principle of OOP – Inheritance, and as you can see from the models’ responses, gpt-4o immediately pointed this out in its analysis, but the gpt-4o mini model missed this flaw. Also, gpt-4o's answer clearly tells me what I should learn and how best to answer the question specifically for my position as a JAVA junior developer, and these hints appear throughout the answer, whereas gpt-4o mini only refers to this at the end of its messages. Well, in general, the gpt-4o answer looks more natural for you and me.

Therefore, when planning to develop solutions based on existing generative AI models, always do an initial analysis to understand how well the models will process your requests. As for me, for generating questions I have so far given preference to gpt-4o mini, slightly adjusting the prompt and expanding it with instructions, but I left the analysis of answers to gpt-4o.

Text-to-speech and speech-to-text technologies

Now let's move on to the technical implementation of the application.

As I wrote earlier, in order not to reinvent the wheel, I decided to use standard solutions from OpenAI, which are available in their API documentation.

Firstly, they initially use models that are trained by OpenAI and will obviously process or create audio much better, and secondly, you don’t need to write any logic, literally 3-4 lines of Python code will completely solve all your problems.

An example of speech-to-text using the whisper model (link to documentation):

from openai import OpenAI

def speech_to_text(audio_file_path):

        with open(audio_file_path, "rb") as audio_file:

         # Используем OpenAI API для транскрипции речи

         transcription = client.audio.transcriptions.create(

                model="whisper-1",

                file=audio_file

            )

        recognized_text = transcription.text

        return recognized_text

Text-to-speech example using the TTS model (link to documentation):

from openai import OpenAI

def text_to_speech(text_input, output_file_name="speech.mp3", voice="nova"):

        # Отправляем текст в OpenAI TTS API

        response = client.audio.speech.create(

            model="tts-1",

            voice=voice,

            input=text_input

        ) 

        # Сохраняем синтезированную речь в MP3 файл

        response.stream_to_file(speech_file_path)

        return str(speech_file_path) 

Another advantage of using Whisper models is that it has a fairly good level of Russian language support. Measuring model quality by indicators WERs (word error rates) or CER (character error rates) shows a fairly low level of errors.

A screenshot of a graph Description automatically generated

A screenshot of a graph Description automatically generated

For comparison, I also tried to use free libraries both for recognizing the voice response and in the opposite direction, but, unfortunately, the quality of the free libraries SpeechRecognition And pyttx3 So far it leaves much to be desired.

Recording and processing voice responses

Probably, if we take into account the entire project, this part was probably the most difficult. Why? Well, firstly, I chose JS to implement the front-end part of the application (unfortunately, I don’t know modern vue, node.js and other things) and it turns out that the technology for recording voice in js in the browser has a lot of nuances related to cross-browser compatibility and support certain voice recording technologies. And we are talking not only about Web browsers, but also about mobile ones.

A little about how voice is processed in the browser in general. Working with voice recording in the browser is divided into 3 parts:

  1. Gaining access to the microphone

  2. Recording and processing a voice message

  3. Saving it to a media file

To work with microphones in JS, the MediaDevices API is most often used.

This API is used to access microphones using the navigator information object in the Browser Object Model (BOM).

Navigator is an object that is used to obtain various information about the browser, network connection, operating system, and so on. Since you may often encounter the fact that a user may have several microphones connected, it is advisable to separately implement the ability to select a microphone in JS.

async function getMicrophones() {

    const devices = await navigator.mediaDevices.enumerateDevices();

    const audioDevices = devices.filter(device => device.kind === 'audioinput');

    const microphoneSelect = document.getElementById('microphoneSelect');

 

    // Очищаем список перед его заполнением

    microphoneSelect.innerHTML = '';

 

    audioDevices.forEach(device => {

        const option = document.createElement('option');

        option.value = device.deviceId;

        option.text = device.label || `Микрофон ${audioDevices.length + 1}`;

        microphoneSelect.appendChild(option);

    });

 

    microphoneSelect.addEventListener('change', (event) => {

        selectedMicrophoneId = event.target.value;

    });

 

    if (audioDevices.length > 0) {

        selectedMicrophoneId = audioDevices[0].deviceId; // Устанавливаем микрофон по умолчанию

    }

}

Next, we move on to the most difficult part of this entire process – the actual recording and processing of the voice response. Initially in the project I tried to use simply MediaRecorder API.

navigator.mediaDevices.getUserMedia({ audio: true})
    .then(stream => {
        const mediaRecorder = new MediaRecorder(stream);
        let voice = [];
        document.querySelector('#start').addEventListener('click', function(){
            mediaRecorder.start();
        });

        mediaRecorder.addEventListener("dataavailable",function(event) {
            voice.push(event.data);
        });

        document.querySelector('#stop').addEventListener('click', function(){
            mediaRecorder.stop();
        });
    });

But during the first cross-browser tests, it turned out that there were problems related to access to the microphone, for example, the Yandex browser did not want to give access, and in Web Safari the microphone was not selected by default. Plus, on mobile browsers, the microphone testing results were very unstable, despite the stated enough ample opportunities in terms of cross-browser compatibility. As a result, I had to look for other, more stable options for voice recording, and as it turned out, there are not many practical cases of working with voice recording in JS on the Internet. What saved me? This is it videorecorded almost 5 years ago.

It turned out to be a fairly good solution to use RecorderJSwhich also supports mobile browsers. I'm no JS expert, but my test results showed generally better post processing across different browsers.

The implementation of all recorders is very similar; everywhere there is a block for initializing the recorder, starting and ending recording.

navigator.mediaDevices.getUserMedia(constraints)
        .then(function (stream) {
            gumStream = stream;

            // Создаем источник для аудио потока
            const input = audioContext.createMediaStreamSource(stream);

            // Создаем объект Recorder.js с поддержкой одного канала
            recorder = new Recorder(input, { numChannels: 1 });

            // Начинаем запись
            recorder.record();

// Останавливаем запись
    recorder.stop();

    // Останавливаем поток микрофона
    gumStream.getAudioTracks()[0].stop();

    // Экспортируем записанные данные в формате WAV и отправляем на сервер
    recorder.exportWAV(function (blob) {
        console.log("Запись завершена");

        // Используем FileReader для отправки аудиофайла на сервер
        let reader = new FileReader();
        reader.readAsArrayBuffer(blob);
        reader.onloadend = function () {

            // Отправить аудио через API
            sendAudioAnswer(reader.result);
        };
    })

What's the result?

Currently, the application is launched and is available for free to anyone who wants to try an interview with AI and, most importantly, get feedback. This was my first experience writing this kind of application, so I hope that the free version of voice interviewer may be useful, especially for those starting their career in IT professionals.

What are the possibilities?

You can generate up to 50 questions in various areas, for example, analytics, development or testing. Answering questions involves the use of both voice responses, and there is the possibility of answering in text format, which is very convenient for solving practical problems. In addition, upon completion of the interview, you can download the interview results, where the questions, your answers and AI comments will be saved for further analysis.

What is important to know. Don't try to think that AI is a person who will answer you very naturally. Consider AI in the context of interviews as an assistant that will tell you how best to answer a particular question. And yes, in my prompt the AI ​​will always give you comments or suggestions for the answer 🙂

Perhaps now an interview with AI still seems something unusual or contradictory, but it seems to me that in the near future most technical interviews will already be conducted by AI, leaving us with an assessment of soft skills.

Also, if you would like to contribute to this project, write to me at telegramI will be glad for any help.

Let's see what awaits us next!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *