Let's go over just the important parts for prompting ChatGPT.
First, I decided to keep as many of the human questions as possible.
If we're just seeing the first question, then we only need to initial question from the user.
If not, we need to keep the system message and up to 750 tokens of user questions (token limits!).
But why?
Well, we want to make sure that, in case the conversation changes directions, we always add more relevant code snippets to the conversation.
Adding the previous queries helps that by contextualizing the user question with more meaning.
This is a design decision, but it helped!
For clarity:
This just means that new code snippets are the ones that are most similar to the concatenation of all previous user queries, not just the most recent.
But, we won't necessarily use all of these messages for question answering.
That comes next!
Now, we need to decide what context to keep for question answering.
Here's the code that determines which messages to include in our chat history.
We need to:
1. Not go over the token limit 2. Avoid losing important context
First, we need to figure out how many tokens are in the two messages that we *must* keep: the system message (with our initial code snippets) and the latest user question (with added context).
After that, we want to add in messages in the middle one at a time.
Honestly, there are a bunch of ways of doing this.
I chose to keep as many tokens worth of the most recent other messages (to leave room for the system message, query, and newly-added context). Our limit is ~4000 for gpt-3.5-turbo.
There are fancier ways to do this, though.
For instance, you could keep the most similar context to the newest question using embeddings.
This filtered chat history is what's actually fed into ChatGPT.
Finally, we call the OpenAI API, and we're off!
The whole process repeats, never going over the token limit + never losing the system message.
If you're curious to see the full code, it's right here (in the backend directory): github.com/mtenenholtz/ch…