Research Scientist @OpenAI. Past: @Google Brain / PhD @MIT
Sep 19, 2024 • 11 tweets • 3 min read
Here is my talk at @MIT (after some delay😅)
I made this talk last year when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm.
It's a good time to zoom out for high level thinking.
I gave an invited lecture on Instruction finetuning and RLHF for @hhexiy 's class at NYU.
One unique perspective of my lecture is that I introduce RLHF as an instance of using a learned objective function.
Video:
Slides: docs.google.com/presentation/d…1/ I start with the instruction finetuning and discuss 2 flavors that use
a) academic datasets (e.g. Flan)
b) the "API" data
The distinction is important because a) have long input and short target.
Meanwhile we are increasingly interested behaviors with long form generation.
May 16, 2023 • 4 tweets • 1 min read
In 2013, I took a class taught by Prof Strang. At the time he had been teaching at MIT for 52 years.
He taught me how to like Linear Algebra and how invaluable teaching is.
A few years later, I took a machine learning class which Prof. Strang audited. Once he was standing at the aisle of the classroom (26-100).
It was so inspiring to have him as a classmate for an hour 🙂
Fun fact: that classroom happens to be the one he gave his final lecture!
Mar 23, 2023 • 4 tweets • 1 min read
Hey Bard...? 🤔
This particular failure is surprisingly reproducible
Oct 21, 2022 • 9 tweets • 3 min read
New paper + models!
We extend instruction finetuning by 1. scaling to 540B model 2. scaling to 1.8K finetuning tasks 3. finetuning on chain-of-thought (CoT) data
With these, our Flan-PaLM model achieves a new SoTA of 75.2% on MMLU.
We study the scaling phenomena of instruction finetuning and find that scaling both the model size and the number of tasks greatly improves performance.
Key takeaway: we should continue to scale instruction finetuning!
(though gains were slightly smaller after 282 tasks)