Interview Bot

tl;dr

This project uses the OpenAI Whisper and Completions APIs, PyAudio, PyQt6, and custom classes and scripts to create a teleprompter to aid an interviewee in answering interview questions. You can feed in a summary of your past experience (e.g. a modified resume) to personalize the answers. For a demo of the interview bot, see this video.

Disclaimer to prospective interviewers reading this - I don’t use this for interviews, I swear!

Summary

This Python project creates a teleprompter-like experience to help you answer interview questions. It’ll record the system audio in real time, transcribe it, and stream an answer from the OpenAI Completions API (ChatGPT) to a teleprompter underneath your camera so that you can read the answer without having to move your eyes (and give away that you’re reading something). If you’re interested in trying the interview bot out for yourself (or critiquing my code), you can find the code and instructions on how to run it in this repo.

I’ll explain things in more detail below, but I wanted to call out a couple things that I think are especially interesting with this project:

Streaming the response from the Completions API - If you call the OpenAI API normally, it will wait for the entire chat to complete before giving you any answer. This would be like ChatGPT waiting until the entire chat was generated to start displaying it on your screen. In testing, I found this to create much too long of a delay, especially for meatier questions and responses. Through some googling (alas ChatGPT isn’t trained on its own API), I found some python code on how to use the streaming option of the API’s Python package. It took some fiddling with threads, but if you watch the demo below you can see the text pops up like it does on ChatGPT.
Different UI options to suit your liking - I knew it would be fairly trivial to populate answers to questions from ChatGPT onto the screen, but you run into the problem of obviously reading while you’re talking. Very small movements of your eyes will show up on video and the interviewer will be able to tell something is up. I tried a few different layouts before settling on the current default (though the others are still implemented). I started with a UI like speed readers where one word is shown at a time for you to read (I actually showed 5 with the center word bold in case you read slow/fast), but that felt too choppy. Then I moved to a traditional paragraph scrolling vertical teleprompter (like they use for speeches and news), but found that at the close distance of your computer you could still see the eyes move to read a line that had ~2-3 words. So I combined the two into the current horizontal scroll and added the ability to speed up or slow down the scrolling speed to your pace.
ChatGPT links in the readme - in the readme for the repo (basically the instructions on how to run the code on your machine), there were a number of more general instructions for which I linked to ChatGPT transcripts rather than linking to the actual documentation pages or other tutorials. For things like creating a virtual environment, I found the ChatGPT instructions much more helpful than your average tutorial, I didn’t have to write it, and it gave instructions for multiple OSes.

The rest of this project page will walk through how I accomplished each part of the response process.

Recording system audio

Simply recording the audio is pretty straightforward using PyAudio. Once ~~you~~ ChatGPT has the script written, the only thing to consider is how to trigger the recording. At first, I wanted this interview bot to be as self sustaining as possible. While starting the recording was always manual, at first I had the recording auto-stop when a few seconds of silence was detected from the audio source:

 1self.silence_counter = 0
 2for i in range(0, int(self.RATE / self.CHUNK_SIZE * self.RECORD_SECONDS)):
 3    data = self.stream.read(self.CHUNK_SIZE)
 4    self.wave_file.writeframes(data)
 5    rms_energy = audioop.rms(data, 2)
 6    if rms_energy < self.SILENCE_THRESHOLD:
 7        self.silence_counter += 1
 8        if self.silence_counter >= self.SILENCE_DURATION * (self.RATE / self.CHUNK_SIZE):
 9            print("Silence detected. Recording stopped.")
10            break
11    else:
12        self.silence_counter = 0

In addition to being a bit inaccurate (maybe the interviewer went on mute to sneeze), this also added a couple seconds to the delay when transcribing and responding to a question. This extra delay added to the existing delays of transcription and responding, which made it too clunky to use in an interview.

Transcribing an audio file

With OpenAI, transcription is so simple that I can include the whole file here:

 1import openai
 2
 3# v basic. transacribe audio file to text using the Whisper OpenAI API
 4class Transcriber:
 5    def __init__(self, api_key, model):
 6        openai.api_key = api_key
 7        self.model = model
 8
 9    def transcribe_audio(self, audio_file_path):
10        with open(audio_file_path, "rb") as audio_file:
11            transcript = openai.Audio.transcribe(self.model, audio_file)
12            return transcript["text"]

I tried to keep the interfaces for the transcription and response classes basic and generalizable so that I could experiment with different models easily, but I haven’t gotten around to that yet.

Responding to the question

Similarly, OpenAI makes the code part of the responder easy:

 1class TextResponder:
 2    # Set up the API key and model for use when generating responses
 3    def __init__(self, api_key, model, starting_messages):
 4        openai.api_key = api_key
 5        self.model = model
 6        self.messages = starting_messages
 7
 8    ...
 9
10    # Generate a response to the given message, but stream the response as it is generated
11    # adapted from https://github.com/trackzero/openai/blob/main/oai-text-gen-with-secrets-and-streaming.py
12    # note: function is a generator, you must iterate over it to get the results
13    def generate_response_stream(self, next_message):
14        self.messages.append({"role": "user", "content": next_message})
15        response = openai.ChatCompletion.create(
16            model=self.model, messages=self.messages, stream=True
17        )
18        # event variables
19        collected_chunks = []
20        collected_messages = ""
21
22        # capture event stream
23        for chunk in response:
24            collected_chunks.append(chunk)  # save the event response
25            chunk_message = chunk["choices"][0]["delta"]  # extract the message
26            if "content" in chunk_message: # make sure the message has "content" (the string we care about)
27                message_text = chunk_message["content"]
28                collected_messages += message_text
29                yield message_text
30
31        # once all chunks are received, save the final message
32        self.messages.append({"role": "assistant", "content": collected_messages})

OpenAI conveniently has a stream option on the Completions call, which turns the response object from a JSON with the full text to an event stream that can be iterated through like any iterator. The generate_response_stream creates a generator (you can tell by the yield) which can itself be iterated through to act on the partial data returned with each event from the OpenAI call.

Displaying it on the screen

The Teleprompter code is more complex than the previous code. This is due to a few reasons: UIs in python are notoriously difficult to work with, ChatGPT isn’t great at using the UI library, and I wasn’t really sure what I wanted when I started to design this part of the code. It took a lot of trial and error to identify a design pattern that I thought worked well.

As I mentioned above, I tried three different scroll mechanisms with this. I started with a speed read style interface, bolding a single word at a time, then tried a traditional vertically scrolling teleprompter, and finally settled on the horizontal scrolling in the demo video below. I also started with the background of the teleprompter being transparent, but changed it to white text on a black background because I found that easier to read.

I won’t go into too much detail as I tried to comment the code for this class decently well, and ChatGPT is good at explaining the code despite having a hard time generating it. The two methods I want to call out here are the methods used to control the text speed. For the bolding method, I fiddled around with various speeds, but found that for the phrasing to sound natural (and to not get behind or ahead in the reading), you had to adjust the speed of the word based on the length of the word and whether or not the word ended with some punctuation. (The @staticmethod below lets python know that you don’t need a self param and I think it lets it do some optimizations using that information):

 1current_word = self.words[self.current_index]
 2adjusted_speed = self._adjust_speed_based_on_word_length(
 3    current_word, self.time_per_word, 0.15
 4)
 5if current_word.endswith((".", ",", ";", ":", "?", "!")):
 6    time.sleep(adjusted_speed * self.punctuation_delay)
 7else:
 8    time.sleep(adjusted_speed)
 9
10...
11
12# Adjust the update speed based on the length of the word
13@staticmethod
14def _adjust_speed_based_on_word_length(word, base_speed, multiplier):
15    adjusted_speed = base_speed * max(
16        len(word) * multiplier, 0.5
17    )  # Ensure a minimum speed
18    return adjusted_speed

To improve on this simplistic timing, you could build a model (or use a dictionary) to predict how long each word would take to say. I’ve been going through Andrej Karpathy’s Intro to Neural Nets Youtube channel, so maybe I’ll use this as a capstone project to prove that I understand how it works.

For the scroll updates, we can use this code to change the scroll bar:

 1# Perform the scroll by updating the QTextEdit's horizontal scrollbar's value
 2scroll_bar = self.text_widget.horizontalScrollBar()
 3new_value = scroll_bar.value() + 1
 4if new_value <= scroll_bar.maximum():
 5    scroll_bar.setValue(new_value)
 6    time.sleep(
 7        self.scroll_delay
 8    )  # Modify this value to adjust the scroll speed
 9else:
10    time.sleep(0.05)  # Don't use CPU excessively if we're at the end

and we update the scroll delay to speed up or slow down the scrolling speed:

 1    # Start the teleprompter scrolling, speed it up, or slow it down (depending on the current state)
 2    def play(self):
 3        # If the teleprompter is already playing, speed it up
 4        if self.scroll_state == "play":
 5            self.scroll_delay /= (
 6                self.SCROLL_DELAY_MULTIPLIER
 7            )
 8        # If the teleprompter is reversing, slow it down    
 9        elif self.scroll_state == "reverse":
10            self.scroll_delay *= (
11                self.SCROLL_DELAY_MULTIPLIER
12            ) 
13        # Otherwise (the teleprompter is stopped), start the teleprompter
14        else:
15            self.scroll_delay = self.base_scroll_delay
16            self.scroll_state = "play"

It took some fiddling to set the default scroll delays (it’s different for horizontal and vertical scrolling) and multipliers for speeding up and slowing down. If you try this out on your own I encourage you to test some settings to find what’s comfortable. I should probably have changed the name of the play and reverse methods since they now double as speed up and slow down methods depending on the state.

Putting it all together

With all the components in place, I created a CentralController class to interface with all of the component classes. This class uses hotkeys on the keyboard to trigger the methods used to control each class:

 1# Define the hotkeys and their corresponding methods
 2# Multiple successive keypresses will call a method multiple times
 3hotkeys = {
 4    "Key.shift": self.trigger_recording,
 5    "Key.shift_r": self.trigger_recording,
 6    "'d'": self.start_scrolling,
 7    "'s'": self.stop_scrolling,
 8    "'a'": self.reverse_scrolling,
 9    "Key.esc": exit,
10    "'t'": self.test_with_test_string,
11}

Conceptually, the sequence of events is straightforward, but since many of the actions have to happen in parallel (like updating the UI at the same time we are receiving the event stream from OpenAI), it requires careful use of threading. I use both PyQt6 threads and builtin Python threads in this project. PyQt6 threads are needed for UI update events because <reasons>, but I’m more familiar with the traditional Python threads. If you’re interested, you can check out the details of the threading in the linked repo.

With all the classes built and connected together, here’s the Interview Bot so far:

I am not sure if I’ll continue working on this project, but I can think of a few cool ways to extend it:

Turn the interview bot into an interviewer for mock interviews. You could even have it score you after the interview.
Try out some different completion models like Llama or Claude and compare the quality of the answers.
Modify the prompt to give you inspiration instead of write the whole response for you. For example, in a behavioral interview it could have a list of your past stories and pick the best one for that specific question, then remind you what that is and give you a couple key details to work into your response.

Let me know via email if you want any more information or if you have any general feedback!