Helen Toner Profile picture
Mar 4 18 tweets 5 min read
If you spend much time on AI twitter, you might have seen this tentacle monster hanging around. But what is it, and what does it have to do with ChatGPT?

It's kind of a long story. But it's worth it! It even ends with cake 🍰

THREAD:
First, some basics of how language models like ChatGPT work:

Basically, the way you train a language model is by giving it insane quantities of text data and asking it over and over to predict what word[1] comes next after a given passage.
Eventually, it gets very good at this.
This training is a type of ✨unsupervised learning✨[2]

It's called that because the data (mountains of text scraped from the internet/books/etc) is just raw information—it hasn't been structured and labeled into nice input-output pairs (like, say, a database of images+labels).
But it turns out models trained that way, by themselves, aren't all that useful.

They can do some cool stuff, like generating a news article to match a lede. But they often find ways to generate plausible-seeming text completions that really weren't what you were going for.[3]
So researchers figured out some ways to make them work better.

One basic trick is "fine-tuning": You partially retrain the model using data specifically for the task you care about.
If you're training a customer service bot, for instance, then maybe you pay some human customer service agents to look at real customer questions and write examples of good responses.

Then you use that nice clean dataset of question-response pairs to tweak the model.
Unlike the original training, this approach is "supervised" because the data you're using is structured as well-labeled input-output pairs.

So you could also call it ✨supervised fine-tuning✨
Another trick is called "reinforcement learning from human feedback," or RLHF.

The way reinforcement learning *usually* works is that you tell an AI model to maximize some kind of score—like points in a video game—then let it figure out how to do that.

RLHF is a bit trickier:
How it works, very roughly, is that you give the model some prompts, let it generate a few possible completions, then ask a human to rank how good the different completions are.

Then, you get your language model to try to learn how to predict the human's rankings...
And then you do reinforcement learning on *that*, so the AI is trying to maximize how much the humans will like text it generates, based on what it learned about what humans like.

So that's ✨RLHF✨

And now we can go back to the tentacle monster!
Now we know what all the words mean, the picture should make more sense. The idea is that even if we can build tools (like ChatGPT) that look helpful and friendly on the surface, that doesn't mean the system as a whole is like that. Instead...
...Maybe the bulk of what's going on is an inhuman Lovecraftian process that's totally alien to how we think about the world, even if it can present a nice face.

(Note that it's not about the tentacle monster being evil or conscious—just that it could be very, very weird.)
"But wait," I hear you say, "You promised cake!"

You're right, I did. And here's why—because the tentacle monster is *also* a play on a very famous slide by a very famous researcher.
Back in 2016, Yann LeCun (Chief AI Scientist at FB) presented this slide at NeurIPS, one of the biggest AI research conferences.

Back then, there was a lot of excitement about RL as the key to intelligence, so LeCun was making a totally different point...
...Namely, that RL was only the "cherry on top," whereas unsupervised learning was the bulk of how intelligence works.

To an AI researcher, the labels on the tentacle monster immediately recall this cake, driven home by the cheery "cherry on top :)"

END THREAD
A few post-thread "well, technically"s and sources:

[1] Technically LMs predict what "token" comes next, but a token is usually roughly a word, so, close enough
[2] Technically people often say "self-supervised learning" rather than "unsupervised learning" for LM pre-training..
...because you could think of "chunk of text"+"next word" as an input-output pair that is automatically present in the dataset—but I guess the tentacle monster artist went with "unsupervised" to match the cake.

[3] Unicorn story from openai.com/research/bette… ...
...and tree poem example from astralcodexten.substack.com/p/janus-simula…

[4] The tentacle monster I used here is from ; I think the original idea is from

Thanks for reading!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Helen Toner

Helen Toner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hlntnr

Jan 16, 2019
Thread of musings on sth I noticed recently: conversations about how we might find meaning in a post-work world heavily feature music and art... but I can't remember sports being mentioned even once. How come, when it provides so much meaning/community/joy to so many people? 1/5
Obvious answer is obvious: sports aren't mentioned because these discussions are being had by Serious Intellectuals with Serious Intellectual tastes. It's a shame though - such a good way to channel our instincts for tribalism & physical competition, especially of young men. 2/5
In general I wish there were more exploration (e.g. in fiction) of rich, meaningful ways we might spend time, make meaning, build communities etc in the future. From the perspective of 100 years ago, our current team sports system could seem like an example of this... 3/5
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(