Hack your next interview with Generative AI

Image generated by Stable Diffusion

Image generated by Stable Diffusion

Translation of an article by Sergei Savvov

Have you ever thought about how cool it would be to hack an interview? Create yourself a digital assistant that will answer all the interviewer’s questions.

So I thought about it. In this article, I propose to create a small application based on Whisper for speech recognition and ChatGPT for text generation. We will also add a simple user interface to make our “cheating” easier.

Application demo

Application demo

Disclaimer: I strongly recommend not using the created application for its intended purpose. The purpose of this article is to demonstrate how, in one evening, you can create a prototype assistant for answering questions, which seemed unthinkable just a year ago.

How it works

Application Diagram

Application Diagram

The application, when you press a button (the main thing is to press it at the right time), begins to record the interviewer’s voice. Then, using the Whisper model, we translate the audio into text. And then we ask chatpt to answer the question posed and display it on the screen. At the same time, all this will not take much time, so you will be able to answer the interviewer’s questions with virtually no delays.

A few notes before we begin:

  1. I deliberately used ready-made APIs so that the solution did not require many resources and could work even on weak laptops.

  2. I tested the functionality of the application only on Linux. You may need to modify audio recording libraries or install additional drivers for other platforms.

  3. You can find all the code in GitHub repositories.

I hope I’ve managed to intrigue you, so let’s get started!

Recording interviewer questions

Since our goal is to develop an application that will work regardless of the platform through which calls are conducted – be it Google Meet, Zoom, Skype, etc., we cannot use the API of these applications. Therefore we we need to record audio directly on our computer.

It is important to note that we will be recording the audio stream not through the microphone, but through the speakers. After a little searching I found the library soundcard. Its authors claim that it is cross-platform, so you should not have any problems with it.

The only drawback for me was the need to specify the exact time range during which the recording will be made. However, this problem can be solved as the recording function returns audio data in numpy array format which can be concatenated.

So, a simple code to record audio through speakers would look like this:

import soundcard as sc


RECORD_SEC = 5
SAMPLE_RATE = 48000
with sc.get_microphone(
      id=str(sc.default_speaker().name),
      include_loopback=True,
  ).recorder(samplerate=SAMPLE_RATE) as mic:
      audio_data = mic.record(numframes=SAMPLE_RATE * RECORD_SEC)

After that we can save it in the format .wav using the library soundfile:

import soundfile as sf


sf.write(file="out.wav", data=audio_data, samplerate=SAMPLE_RATE)

Here you will find code related to audio recording.

Speech recognition

In this step we will use the model Whisper from Open Ai, which can work with multiple languages. During my tests, it showed good text recognition quality, so I decided to go with it. It can also be used via the API:

import openai


def transcribe_audio(path_to_file: str = "out.wav") -> str:
    with open(path_to_file, "rb") as audio_file:
        transcript = openai.Audio.translate("whisper-1", audio_file)
    return transcript["text"]

If you prefer not to use the API, you can run it locally. I would recommend using whisper.cpp. This is a high-performance solution that does not require many resources (author of the library ran the model on an iPhone 13 device).

Here you will find documentation on the Whisper API.

Generating a response

We will use ChatGPT to generate an answer to the interviewer’s question. Although using the API seems like a simple task, we need to solve two additional problems:

1. Editing recognized text – transcripts may be of poor quality. For example, if the interviewer is hard to hear, or the record button is activated too late.

2. Speed ​​up response generation – It is important for us to get the answer as quickly as possible in order to keep the conversation flowing smoothly and prevent doubts.

Improving the quality of transcripts

To deal with this, we’ll explicitly indicate in the system prompt that we’re using potentially imperfect audio transcriptions:

SYSTEM_PROMPT = """You are interviewing for a {INTERVIEW_POSTION} position.
You will receive an audio transcription of the question.
Your task is to understand question and write an answer to it."""

Speeding up text generation

To speed up generation, we will make two simultaneous requests to ChatGPT. This concept is close to the approach described in the article Skeleton-of-Thoughtand is visually presented below:

Diagram of two-way communication with ChatGPT

Diagram of two-way communication with ChatGPT

The first request will generate a quick response, no more than 70 words. This will help continue the interview without awkward pauses:

QUICK = "Concisely respond, limiting your answer to 70 words."

The second request will return a more detailed response. This is necessary to maintain deeper engagement in the conversation:

FULL = """Before answering, take a deep breath and think step by step.
Your answer should not exceed more than 150 words."""

It is worth noting that the request uses the structure “take a deep breath and think step by step”(“take a deep breath and think step by step”), methodwhich recent studies have shown provides the highest quality of responses.

Here you will find code related to the ChatGPT API.

Creating a Simple GUI

Demo of application control using buttons

Demo of application control using buttons

To visualize the response from ChatGPT, we need to create a simple GUI application. After studying several frameworks, I decided to settle on PySimpleGUI. It allows you to create graphical applications easily with extensive a set of widgets. In addition, I needed the following functionality, which I was able to find in this library:

  • The ability to quickly write a working prototype without spending a lot of time delving into the project documentation.

  • Support for running long-running functions in a separate thread.

  • Keyboard key management.

Here is sample code to create a simple application making requests to the OpenAI API in a separate thread using perfrom_long_operation:

import PySimpleGUI as sg


sg.theme("DarkAmber")
chat_gpt_answer = sg.Text(  # we will update text later
    "",
    size=(60, 10),
    background_color=sg.theme_background_color(),
    text_color="white",
)
layout = [
    [sg.Text("Press A to analyze the recording")],
    [chat_gpt_answer],
    [sg.Button("Cancel")],
]
WINDOW = sg.Window("Keyboard Test", layout, return_keyboard_events=True, use_default_focus=False)

while True:
    event, values = WINDOW.read()
    if event in ["Cancel", sg.WIN_CLOSED]:
        break
    elif event in ("a", "A"):  # Press A --> analyze
        chat_gpt_answer.update("Making a call to ChatGPT..")
        WINDOW.perform_long_operation(
            lambda: generate_answer("Tell me a joke about interviewing"),
            "-CHAT_GPT ANSWER-",
        )
    elif event == "-CHAT_GPT ANSWER-":
        chat_gpt_answer.update(values["-CHAT_GPT ANSWER-"])

Here you will find the code associated with the GUI application.

Putting it all together

Now that we’ve covered all the necessary components, it’s time to build our application. Here’s a schematic architecture of what it would look like:

Logic of button operation in the graphical interface

Logic of button operation in the graphical interface

To better understand how this all works, I recorded a demo:

Further work

If you want to improve this solution, here are some tips for improvement:

  • Speed ​​up answers from LLM: To do this, you can use models with relatively fewer parameters, such as LlaMA-2 13Bto speed up response time. And also use various additional acceleration techniques that I wrote about here.

  • Use NVIDIA Broadcast: This model allows your eyes to always look at the camera, even if you decide to look away. In this case, the interviewer will not notice that you are reading the answer.

    NVIDIA Broadcast demo.

    NVIDIA Broadcast demo.

  • Create a browser extension: This can be especially useful if you are asked to do live coding. In this case, you can simply select the task and send it for solution.

Conclusion

So, using Whisper and ChatGPT, we built ourselves an interview assistant.

Of course, it is not recommended to use it for such purposes, because it is at least unethical. Our main goal was to try to expand the boundaries of AI capabilities, as well as to show the depth of this world, because we are standing on the threshold of a new era, and we have opened this door just a little. Who knows what incredible innovations lie ahead.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *