Thread by @DrJimFan on Thread Reader App

Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇

The original GPT-3 was trained by a minimalistic objective: predict the next word on a massive text corpus. Many abilities magically emerge, such as reasoning, coding, translation. You can even do “few-shot learning”: define new tasks by providing I/O examples in context. 1/

It’s not at all obvious why simply predicting the next word can give us such abilities. One intuitive explanation is to imagine a detective story. Suppose the model needs to fill in the last blank: “the murderer is ___”, then it has to do deep reasoning to answer correctly. 2/

But this is not enough. In practice, we have to coax GPT-3 to autocomplete what we desire by carefully curating the examples, wording, and structure. This is exactly “prompt engineering”, where users have to practice the awkward and sometimes nonsensical vernacular of LLMs. 3/

Prompt engineering is a BUG🐞, not a feature! It’s caused by the fundamental *misalignment* between the next-word objective and the actual user intent in real applications. Example: you want GPT-3 to “Explain the moon landing to a 6yo”. It replies like a drunk parrot🦜: 4/

Prompt engineering is even worse in DALLE2 and Stable Diffusion. Just go to lexica.art and see how insane some prompts are. My favorite is the “parentheses trick” - adding (((…))) sometimes gives you better images 😅. It’s both hilarious and embarassing. 5/

ChatGPT and the base model InstructGPT address the plague in an elegant way. The key observation is that alignment is very hard to be captured by in-the-wild data. Humans must be in the loop to help tutor GPT, and GPT will be able to ask better questions as it improves. 6/

There are 3 steps. The first is very straightforward: just collect a dataset of human-written answers to prompts that users submit, and finetune GPT by supervised learning. It’s easiest but also the most costly: it could be slow and painful for humans to write long responses. 7/

Step 2 is much more interesting. GPT is asked to *propose* a few different answers, and all a human annotator needs to do is *ranking* the responses from most desirable to least. Using these labels, we can train a reward model that captures human *preferences*. 8/

In reinforcement learning (RL), the reward function is typically hardcoded, such as the game score in Atari games. ChatGPT’s data-driven reward model is a powerful idea. Another example is our recent MineDojo work that learns reward from tons of Minecraft YouTube videos: 9/

Step 3: treat GPT as a policy and optimize it by RL against the learned reward. PPO is chosen as a simple and effective training algorithm. Now that GPT is better aligned, we can rinse and repeat step 2-3 to improve it continously. It’s like CI for LLM! 10/

This is the “Instruct” paradigm - a super effective way to do alignment, as evident in ChatGPT’s mindblowing demos. The RL part also reminds me of the famous P=NP (or ≠) problem: it tends to be much easier to verify a solution than actually solving the problem from scratch. 11/

Similarly, humans can quickly assess the quality of GPT’s output, but it’s much harder and cognitively taxing to write out a full solution. InstructGPT exploits this fact to lower the manual labeling cost significantly, making it practical to scale up the model CI pipeline. 12/

Another interesting connection is that the Instruct training looks a lot like GANs. Here ChatGPT is a generator and reward model (RM) is a discriminator. ChatGPT tries to fool RM, while RM learns to detect alien with human help. The game converges when RM can no longer tell. 13/

Model alignment with user intent is also making its way to image generation! There are some preliminary works, such as arxiv.org/abs/2211.09800. Given the explosive AI progress, how long will it take to have an Instruct- or Chat-DALLE that feels like talking to a real artist? 14/

So folks, enjoy prompt engineering while it lasts! It’s an unfortunate historical artifact - a bit like alchemy🧪, neither art nor science. Soon it will just be “prompt writing” - my grandma can get it right on her first try. No more magic incantations to coerce the model. 15/

Of course, ChatGPT is not perfect enough to completely eliminate prompt engineering for now, but it is an unstoppable force. Meanwhile, the model has other serious syndromes: hallucination & habitual BS. I covered this in another thread: 16/

There are ongoing open-source efforts for the Instruct paradigm! To name a few:
👉 trlx @carperai github.com/CarperAI/trlx. Carper AI is an org from @StabilityAI
👉 RL4LM @rajammanabrolu @allen_ai rl4lms.apps.allenai.org
I’m so glad to have met the above authors at NeurIPS! 17/

Further reading: reward model also has scaling laws: arxiv.org/abs/2210.10760! Also the RM is only an imperfect proxy (unlike Atari), so it’s a bad idea to over-optimize. This paper is from @johnschulman2, inventor of PPO. Super interesting work but went under the radar. 18/

There are also other artifacts caused by the misalignment problem, such as prompt hacking or “injection”. I actually like this one because it allows us to bypass OpenAI’s prompt prefix and fully unleash the model 😆. See @goodside’s cool findings: 19/

Thanks for reading! Welcome to follow me for more deep dives in the latest AI tech 🙌.
References:
👉 openai.com/blog/instructi…
👉 InstructGPT paper: arxiv.org/abs/2203.02155
👉 openai.com/blog/chatgpt/
👉 Beautiful illustrations: jalammar.github.io/how-gpt3-works…
END/🧵

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll