Why does ChatGPT work so well? Is it “just scaling up GPT-3” under the hood? In this 🧵, let’s discuss the “Instruct” paradigm, its deep technical insights, and a big implication: “prompt engineering” as we know it may likely disappear soon:👇
The original GPT-3 was trained by a minimalistic objective: predict the next word on a massive text corpus. Many abilities magically emerge, such as reasoning, coding, translation. You can even do “few-shot learning”: define new tasks by providing I/O examples in context. 1/
It’s not at all obvious why simply predicting the next word can give us such abilities. One intuitive explanation is to imagine a detective story. Suppose the model needs to fill in the last blank: “the murderer is ___”, then it has to do deep reasoning to answer correctly. 2/
But this is not enough. In practice, we have to coax GPT-3 to autocomplete what we desire by carefully curating the examples, wording, and structure. This is exactly “prompt engineering”, where users have to practice the awkward and sometimes nonsensical vernacular of LLMs. 3/
Prompt engineering is a BUG🐞, not a feature! It’s caused by the fundamental *misalignment* between the next-word objective and the actual user intent in real applications. Example: you want GPT-3 to “Explain the moon landing to a 6yo”. It replies like a drunk parrot🦜: 4/
Prompt engineering is even worse in DALLE2 and Stable Diffusion. Just go to lexica.art and see how insane some prompts are. My favorite is the “parentheses trick” - adding (((…))) sometimes gives you better images 😅. It’s both hilarious and embarassing. 5/
ChatGPT and the base model InstructGPT address the plague in an elegant way. The key observation is that alignment is very hard to be captured by in-the-wild data. Humans must be in the loop to help tutor GPT, and GPT will be able to ask better questions as it improves. 6/
There are 3 steps. The first is very straightforward: just collect a dataset of human-written answers to prompts that users submit, and finetune GPT by supervised learning. It’s easiest but also the most costly: it could be slow and painful for humans to write long responses. 7/
Step 2 is much more interesting. GPT is asked to *propose* a few different answers, and all a human annotator needs to do is *ranking* the responses from most desirable to least. Using these labels, we can train a reward model that captures human *preferences*. 8/
In reinforcement learning (RL), the reward function is typically hardcoded, such as the game score in Atari games. ChatGPT’s data-driven reward model is a powerful idea. Another example is our recent MineDojo work that learns reward from tons of Minecraft YouTube videos: 9/
Step 3: treat GPT as a policy and optimize it by RL against the learned reward. PPO is chosen as a simple and effective training algorithm. Now that GPT is better aligned, we can rinse and repeat step 2-3 to improve it continously. It’s like CI for LLM! 10/
This is the “Instruct” paradigm - a super effective way to do alignment, as evident in ChatGPT’s mindblowing demos. The RL part also reminds me of the famous P=NP (or ≠) problem: it tends to be much easier to verify a solution than actually solving the problem from scratch. 11/
Similarly, humans can quickly assess the quality of GPT’s output, but it’s much harder and cognitively taxing to write out a full solution. InstructGPT exploits this fact to lower the manual labeling cost significantly, making it practical to scale up the model CI pipeline. 12/
Another interesting connection is that the Instruct training looks a lot like GANs. Here ChatGPT is a generator and reward model (RM) is a discriminator. ChatGPT tries to fool RM, while RM learns to detect alien with human help. The game converges when RM can no longer tell. 13/
Model alignment with user intent is also making its way to image generation! There are some preliminary works, such as arxiv.org/abs/2211.09800. Given the explosive AI progress, how long will it take to have an Instruct- or Chat-DALLE that feels like talking to a real artist? 14/
So folks, enjoy prompt engineering while it lasts! It’s an unfortunate historical artifact - a bit like alchemy🧪, neither art nor science. Soon it will just be “prompt writing” - my grandma can get it right on her first try. No more magic incantations to coerce the model. 15/
Of course, ChatGPT is not perfect enough to completely eliminate prompt engineering for now, but it is an unstoppable force. Meanwhile, the model has other serious syndromes: hallucination & habitual BS. I covered this in another thread: 16/
Further reading: reward model also has scaling laws: arxiv.org/abs/2210.10760! Also the RM is only an imperfect proxy (unlike Atari), so it’s a bad idea to over-optimize. This paper is from @johnschulman2, inventor of PPO. Super interesting work but went under the radar. 18/
There are also other artifacts caused by the misalignment problem, such as prompt hacking or “injection”. I actually like this one because it allows us to bypass OpenAI’s prompt prefix and fully unleash the model 😆. See @goodside’s cool findings: 19/
ChatGPT is the closest model we have that looks like AGI, but still suffers from hallucination and tends to write plausible-looking bs with uncanny confidence. It could be misleading and even dangerous. Can we fix it? There is a simple & intuitive remedy:🧵
Let’s first identify the root cause. Hallucinating facts is an unfortunate side effect of trying to *memorize* everything in the 100B+ model parameters of ChatGPT. Even humans have unreliable and made-up memories, so of course ChatGPT is no exception. 2/
We are no stranger to false facts ourselves, which surged during the pandemic and elections. Hiring “fact checkers” was the solution we came up to reduce human-generated bs. It is actually similar to the “retrieval” mechanism in LLM terminology. 3/
Excited to go to NeurIPS conference tomorrow! It's an annual gala for AI. Many revolutionary ideas debuted here, like AlexNet, Transformer, & GPT-3.
I read *all 15* Outstanding Papers and can’t wait to share my thoughts with you all. Here's your front row seat to the festival:🧵
For each paper, I’ll give a TLDR and a note on why I think it’s significant. I may also link any interesting blogs and websites that dive in with greater depth. Original authors are welcome to chime in and expand the discussion or correct any mistakes! Tweet index is by paper.
Training Compute-Optimal Large Language Models. Hoffmann et al, @DeepMind. TLDR: introduces a new 70B LM called "Chinchilla”🐭 that outperforms much bigger LMs (GPT-3, Gopher). To be compute-optimal, model size and training data must be scaled equally. 1.1/
GPT3 is powerful but blind. The future of Foundation Models will be embodied agents that proactively take actions, endlessly explore the world, and continuously self-improve. What does it take? In our NeurIPS Outstanding Paper “MineDojo”, we provide a blueprint for this future:🧵
We argue that there are 3 main ingredients for generalist agents to emerge. First, an open-ended environment that allows an unlimited variety of tasks and goals. Earth is one example, as it is rich enough to forge an ever-expanding tree of life forms and behaviors. What else? 2/
Second, a large-scale knowledge base that teaches an AI not only *how* to do things, but also *what* are the useful things to do. GPT-3 learns from web text alone, but can we give our agent much richer data, such as video walkthroughs, multimedia tutorials, and free-form wiki? 3/
Today a 120B model called “Galactica” is open-sourced by @paperswithcode. It’s capable of writing math notations, citations, code, chemical formula, DNA, etc. Here’s why I think Galactica is a huge milestone in open foundation models, scientific automation, and responsible AI: 🧵
Large language models have personalities. They are not shaped by the architecture, but by the training data. Models like GPT-3 and OPT are trained on texts scraped from the internet at large, which unfortunately contains lots of irrelevant, misinformed, or toxic contents. 2/🧵
In contrast, scientific texts like academic papers are mostly immune from these data plagues. They contain analytical text with a neutral tone, knowledge backed by evidence, and are written by people who wish to inform rather than inflame. A dataset born in the ivory tower. 3/🧵
We trained a transformer called VIMA that ingests *multimodal* prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data!🧵
We envision that a generalist robot agent should have an intuitive and expressive interface for task specification, but text alone is not enough. We introduce a novel multimodal prompting framework that converts a wide spectrum of robotic tasks into one sequence modeling problem.
Our VIMA model (reads “v-eye-ma”) consists of a pre-trained T5 to encode multimodal prompts, and a transformer decoder to predict robot arm commands autoregressively. The decoder has alternating self- and cross-attention layers conditioned on the prompt.