Hack your next interview with Generative AI
Translation of an article by Sergei Savvov
Have you ever thought about how cool it would be to hack an interview? Create yourself a digital assistant that will answer all the interviewer’s questions.
So I thought about it. In this article, I propose to create a small application based on Whisper for speech recognition and ChatGPT for text generation. We will also add a simple user interface to make our “cheating” easier.
Disclaimer: I strongly recommend not using the created application for its intended purpose. The purpose of this article is to demonstrate how, in one evening, you can create a prototype assistant for answering questions, which seemed unthinkable just a year ago.
How it works
The application, when you press a button (the main thing is to press it at the right time), begins to record the interviewer’s voice. Then, using the Whisper model, we translate the audio into text. And then we ask chatpt to answer the question posed and display it on the screen. At the same time, all this will not take much time, so you will be able to answer the interviewer’s questions with virtually no delays.
A few notes before we begin:
I deliberately used ready-made APIs so that the solution did not require many resources and could work even on weak laptops.
I tested the functionality of the application only on Linux. You may need to modify audio recording libraries or install additional drivers for other platforms.
You can find all the code in GitHub repositories.
I hope I’ve managed to intrigue you, so let’s get started!
Recording interviewer questions
Since our goal is to develop an application that will work regardless of the platform through which calls are conducted – be it Google Meet, Zoom, Skype, etc., we cannot use the API of these applications. Therefore we we need to record audio directly on our computer.
It is important to note that we will be recording the audio stream not through the microphone, but through the speakers. After a little searching I found the library soundcard. Its authors claim that it is cross-platform, so you should not have any problems with it.
The only drawback for me was the need to specify the exact time range during which the recording will be made. However, this problem can be solved as the recording function returns audio data in numpy array format which can be concatenated.
So, a simple code to record audio through speakers would look like this:
import soundcard as sc
RECORD_SEC = 5
SAMPLE_RATE = 48000
with sc.get_microphone(
id=str(sc.default_speaker().name),
include_loopback=True,
).recorder(samplerate=SAMPLE_RATE) as mic:
audio_data = mic.record(numframes=SAMPLE_RATE * RECORD_SEC)
After that we can save it in the format .wav
using the library soundfile:
import soundfile as sf
sf.write(file="out.wav", data=audio_data, samplerate=SAMPLE_RATE)
Here you will find code related to audio recording.
Speech recognition
In this step we will use the model Whisper from Open Ai, which can work with multiple languages. During my tests, it showed good text recognition quality, so I decided to go with it. It can also be used via the API:
import openai
def transcribe_audio(path_to_file: str = "out.wav") -> str:
with open(path_to_file, "rb") as audio_file:
transcript = openai.Audio.translate("whisper-1", audio_file)
return transcript["text"]
If you prefer not to use the API, you can run it locally. I would recommend using whisper.cpp. This is a high-performance solution that does not require many resources (author of the library ran the model on an iPhone 13 device).
Here you will find documentation on the Whisper API.
Generating a response
We will use ChatGPT to generate an answer to the interviewer’s question. Although using the API seems like a simple task, we need to solve two additional problems:
1. Editing recognized text – transcripts may be of poor quality. For example, if the interviewer is hard to hear, or the record button is activated too late.
2. Speed up response generation – It is important for us to get the answer as quickly as possible in order to keep the conversation flowing smoothly and prevent doubts.
Improving the quality of transcripts
To deal with this, we’ll explicitly indicate in the system prompt that we’re using potentially imperfect audio transcriptions:
SYSTEM_PROMPT = """You are interviewing for a {INTERVIEW_POSTION} position.
You will receive an audio transcription of the question.
Your task is to understand question and write an answer to it."""
Speeding up text generation
To speed up generation, we will make two simultaneous requests to ChatGPT. This concept is close to the approach described in the article Skeleton-of-Thoughtand is visually presented below:
The first request will generate a quick response, no more than 70 words. This will help continue the interview without awkward pauses:
QUICK = "Concisely respond, limiting your answer to 70 words."
The second request will return a more detailed response. This is necessary to maintain deeper engagement in the conversation:
FULL = """Before answering, take a deep breath and think step by step.
Your answer should not exceed more than 150 words."""
It is worth noting that the request uses the structure “take a deep breath and think step by step”(“take a deep breath and think step by step”), methodwhich recent studies have shown provides the highest quality of responses.
Here you will find code related to the ChatGPT API.
Creating a Simple GUI
To visualize the response from ChatGPT, we need to create a simple GUI application. After studying several frameworks, I decided to settle on PySimpleGUI. It allows you to create graphical applications easily with extensive a set of widgets. In addition, I needed the following functionality, which I was able to find in this library:
The ability to quickly write a working prototype without spending a lot of time delving into the project documentation.
Support for running long-running functions in a separate thread.
Keyboard key management.
Here is sample code to create a simple application making requests to the OpenAI API in a separate thread using perfrom_long_operation
:
import PySimpleGUI as sg
sg.theme("DarkAmber")
chat_gpt_answer = sg.Text( # we will update text later
"",
size=(60, 10),
background_color=sg.theme_background_color(),
text_color="white",
)
layout = [
[sg.Text("Press A to analyze the recording")],
[chat_gpt_answer],
[sg.Button("Cancel")],
]
WINDOW = sg.Window("Keyboard Test", layout, return_keyboard_events=True, use_default_focus=False)
while True:
event, values = WINDOW.read()
if event in ["Cancel", sg.WIN_CLOSED]:
break
elif event in ("a", "A"): # Press A --> analyze
chat_gpt_answer.update("Making a call to ChatGPT..")
WINDOW.perform_long_operation(
lambda: generate_answer("Tell me a joke about interviewing"),
"-CHAT_GPT ANSWER-",
)
elif event == "-CHAT_GPT ANSWER-":
chat_gpt_answer.update(values["-CHAT_GPT ANSWER-"])
Here you will find the code associated with the GUI application.
Putting it all together
Now that we’ve covered all the necessary components, it’s time to build our application. Here’s a schematic architecture of what it would look like:
To better understand how this all works, I recorded a demo:
Further work
If you want to improve this solution, here are some tips for improvement:
Speed up answers from LLM: To do this, you can use models with relatively fewer parameters, such as LlaMA-2 13Bto speed up response time. And also use various additional acceleration techniques that I wrote about here.
Use NVIDIA Broadcast: This model allows your eyes to always look at the camera, even if you decide to look away. In this case, the interviewer will not notice that you are reading the answer.
Create a browser extension: This can be especially useful if you are asked to do live coding. In this case, you can simply select the task and send it for solution.
Conclusion
So, using Whisper and ChatGPT, we built ourselves an interview assistant.
Of course, it is not recommended to use it for such purposes, because it is at least unethical. Our main goal was to try to expand the boundaries of AI capabilities, as well as to show the depth of this world, because we are standing on the threshold of a new era, and we have opened this door just a little. Who knows what incredible innovations lie ahead.