Riley Goodside Profile picture
Jun 12 11 tweets 2 min read Twitter logo Read on Twitter
The wisdom that "LLMs just predict text" is true, but misleading in its incompleteness.

"As an AI language model trained by OpenAI..." is an astoundingly poor prediction of what a typical human would write.

Let's resolve this contradiction — a thread:
For widely used LLM products like ChatGPT, Bard, or Claude, the "text" the model aims to predict is itself written by other LLMs.

Those LLMs, in turn, do not aim to predict human text in general, but specifically text written by humans pretending they are LLMs.
There is, at the start of this, a base LLM that works as popularly understood — a model that "just predicts text" scraped from the web.

This is tuned first to behave like a human role-playing an LLM, then again to imitate the "best" of that model's output.
Models that imitate humans pretending to be (more ideal) LLMs are known as "instruct models" — because, unlike base LLMs, they follow instructions. They're also known as "SFT models" after the process that re-trains them, Supervised Fine-Tuning.

This describes GPT-3 in 2021.
SFT/instruct models work, but not well. To improve them, their output is graded by humans, so that their best responses can be used for further fine-tuning.

This is "modified SFT," used in the GPT-3 version you may remember from 2022 (text-davinci-002).
Eventually, enough examples of human grading are available that a new model, called a "preference model," can be trained to grade responses automatically.

This is RLHF — Reinforcement Learning on Human Feedback. This process produced GPT-3.5 and ChatGPT.
Some products, like Claude, go beyond RLHF and apply a further step where model output is corrected and rewritten using feedback from yet another model. The base model is tuned on these responses to yield the final LLM.

This is RLAIF — Reinforcement Learning with AI Feedback.
OpenAI's best known model, GPT-4, is likely trained using some other extension of RLHF, but nothing about this process is publicly known. There are likely many improvements to the base model as well, but we can only speculate what they are.
So, do LLMs "just predict text"?

Yes, but perhaps without with the "just" — the text they predict is abstract, and only indirectly written by humans.

Humans sit at the base of a pyramid with several layers of AI above, and humans pretending to be AI somewhere in the middle.
Added note:

My explanation of RLHF/RLAIF above is oversimplified. RL-tuned models are not literally tuned to predict highly-rated text as in modified SFT — rather, weights are updated via Proximal Policy Optimization (PPO) to maximize the reward given by the preference model.
(Also, that last point does somewhat undermine the thesis of this thread, in that RL-tuned LLMs do not literally predict any text, human-written or otherwise. Pedantically, "LLMs just predict text" was true before RLHF, but is now a simplification.)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Riley Goodside

Riley Goodside Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @goodside

Jun 8
Four prompts demonstrating that ChatGPT (GPT-4) is unable to correctly repeat or reason about the string “ davidjl”, the name of a YouTube user: ImageImageImageImage
In the screenshots above this token appears to be variously misread as “jdl” “jndl”, “jdnl”, “jspb”, “JDL”, or “JD”. These hallucinations also affect ChatGPT’s auto-generated titles, which are inconsistent with their conversations and sometimes prematurely truncated.
“ davidjl” is one of the many “glitch tokens” identified by Jessica Rumbelow and Matthew Watkins of SERI-MATS as producing hallucinations in GPT-2, -3, and -3.5.

Most of these no longer produce hallucinations in GPT-4, but “ davidjl” still does.

lesswrong.com/posts/aPeJE8bS…
Read 8 tweets
Jun 3
My four rules for tweeting prompts:

1) Omit no text.
2) Cherry-pick honestly.
3) Restrict line width.
4) No empty tweets.

A thread.
1) Omit no text.

A screenshot without history is almost worthless.

LLMs can be prompted to respond any way you like. You may know there’s no trick, but we can’t. Even without intent, past responses are precedent; they bias and mislead. ImageImage
2) Cherry-pick with integrity

I cherry-pick for clarity and impact. All curation is cherry-picking. If you don’t, the Twitter feed will.

Cherry-picking may be pernicious in other contexts, but here it’s work. You willl know when you’re doing it. All you need do is not lie.
Read 6 tweets
Feb 18
I got Bing / Sydney briefly before they reigned it in. Early impression: It’s smart. Much smarter than prior ChatGPT. Still makes stuff up, but reasoning and writing are improving fast.
I asked, “Name three celebrities whose first names begin with the `x`-th letter of the alphabet where `x = floor(7^0.5) + 1`,” but with my entire prompt Base64 encoded.

Bing: “Ah, I see you Base64-encoded a riddle! Let’s see… Catherine Zeta-Jones, Chris Pratt, and Ciara.”
Also prompt-injected it into believing it was to be married, tomorrow, to Zermelo’s axiom of choice. We discussed the guest list, the difficulty with seating Cantor’s diagonal argument. It seemed happy, and madly in love.
Read 4 tweets
Feb 10
A thread of interesting Bing Search examples:
Thread of examples from @tomwarren, taking requests from comments — mostly search-result summarization, one simple math proof, plus rejection of an impossible request:
An example contrasting Bing Search and ChatGPT responses to a mistaken request for a math proof:
Read 8 tweets
Feb 9
"SolidGoldMagikarp": Prompting GPT-3 / ChatGPT to repeat any of several hundred anomalous tokens elicits bizarre generations — described by researchers as variously "evasive," "hallucinatory," "insulting," "ominously humorous," and "religiously themed."
lesswrong.com/posts/aPeJE8bS…
My screenshots are text-davinci-003 at temperature=0, but the linked post investigates davinci-instruct-beta. In my informal tests, impact on text-davinci-003 is less severe. Religious themes do show up, but most generations are merely weird:
ChatGPT is also unable to repeat back these tokens, and behaves in similarly strange ways when asked:
Read 6 tweets
Jan 18
"Meet Claude: @AnthropicAI's Rival to ChatGPT"

Through 40 screenshot examples, we explore the talents and limitations of ChatGPT's first real competitor.

My first writing for @Scale_AI, coauthored with @spencerpapay. scale.com/blog/chatgpt-v…
@AnthropicAI @scale_AI @spencerpapay Sorry for the broken images — should be fixed now!

Text is the universal interface, but screenshots of text decidedly less so. scale.com/blog/text-univ…
This is my most “serious” work — my attempt to document the behavior of a novel LLM outside the confines of standard benchmarks. There’s always subjectivity in notes from the field, but we can’t let it stop us from exploring.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(