Alex Xu Profile picture
Co-Founder of ByteByteGo | Author of the bestselling book series: ‘System Design Interview’ | YouTube: https://t.co/9gPSJSrtPU

Jan 31, 2023, 12 tweets

/1 How does ChatGPT work?

Disclaimer: since OpenAI hasn't provided all the details, some parts of the diagram may be inaccurate. @sama, we would love to hear your feedback.

We attempted to explain how it works in the diagram below. The process can be broken down into two parts.

/2 1. Training. To train a ChatGPT model, there are two stages:

- Pre-training: In this stage, we train a GPT model (decoder-only transformer) on a large chunk of internet data.

/3 The objective is to train a model that can predict future words given a sentence in a way that is grammatically correct and semantically meaningful.

After the pre-training stage, the model can complete given sentences, but it is not capable of responding to questions.

/4 - Fine-tuning: This stage is a 3-step process that turns the pre-trained model into a question-answering ChatGPT model:

/5 1). Collect training data that comprise (questions and answers), and fine-tune the pre-trained model on this data. The model takes a question as input and learns to generate an answer similar to the training data.

/6
2). Collect more data that comprise (question, several answers) and train a reward model to rank these answers from most relevant to least relevant.
3). Use reinforcement learning (PPO optimization) to fine-tune the model, so the model's answers are more accurate.

/7 2. Answer a prompt
🔹Step 1: The user enters the full question.

🔹Step 2: The question is sent to a content moderation component. This component ensures that the question does not violate safety guidelines and filters inappropriate questions.

/8 🔹Steps 3-4: If the input passes content moderation, it is sent to the chatGPT model. If the input doesn’t pass content moderation, it goes straight to template response generation.

/9 🔹Step 5-6: Once the model generates the response, it is sent to a content moderation component again. This ensures the generated response is safe, harmless, unbiased, etc.

/10 🔹Step 7: If the input passes content moderation, it is shown to the user. If the input doesn’t pass content moderation, it goes to template response generation and shows a template answer to the user.

/11 Subscribe to our weekly free newsletter to learn something new every week: bit.ly/3FEGliw

/12 I hope you've found this thread helpful.

Follow me @alexxubyte for more.

Like/Retweet the first tweet below if you can:

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling