Latest Twitter Threads by @_jasonwei on Thread Reader App

May 3, 2023 • 12 tweets • 4 min read

Since GPT-4, some have argued that emergence in LLMs is overstated, or even a "mirage". I don't think these arguments debunk emergence, but they warrant discussion (it's generally good to examine scientific phenomena critically).

A blog post: jasonwei.net/blog/common-ar…

🧵⬇️

Argument 1: Emergence occurs for “hard” evaluation metrics like exact match or multiple-choice, and if you use metrics that award partial credit, then performance improves smoothly (arxiv.org/abs/2304.15004).

Mar 16, 2023 • 9 tweets • 3 min read

I’m hearing chatter of PhD students not knowing what to work on.
My take: as LLMs are deployed IRL, the importance of studying how to use them will increase.
Some good directions IMO (no training):
1. prompting
2. evals
3. LM interfaces
4. safety
5. understanding LMs
6. emergence 1. Prompting research. Maybe hot take, but I think we’ve just reached the tip of the iceberg on the best ways to prompt language models. As language model capabilities increase, the degrees of freedom for guiding a particular generation via a good prompt will increase.

Mar 13, 2023 • 5 tweets • 2 min read

Hot take supported by evidence: for a given NLP task, it is unwise to extrapolate performance to larger models because emergence can occur.

I manually examined all 202 tasks in BIG-Bench, and the most common category was for the scaling behavior to *unpredictably* increase.

So the idea that emergent/unpredictable scaling behavior is "cherrypicked" is simply untrue.

However, it is true that loss on a broad test set or aggregate performance on BIG-Bench can improve predictably. But for a single downstream task this is simply not the case.

Feb 9, 2023 • 8 tweets • 2 min read

Studying emergent abilities of language models can seem elusive for researchers without have access to Google/DeepMind models.

A 🧵 with some unexplored ideas to study emergence using (1) the free codex API, (2) flan-t5, or (3) big-bench paper analysis.

https://twitter.com/percyliang/status/1622391583109939202

(1) Many don't know, but the code-* API is free, and you can run three sizes of models: curie-001, davinci-001, and davinci-002. Davinci-002 is comparable with PaLM.

To get more model scales, small models such as text-ada-001 or ada-curie can be evaluated for relatively cheap.

Jan 25, 2023 • 6 tweets • 3 min read

Yesterday I gave a lecture at @Stanford's CS25 class on Transformers!

The lecture was on how “emergent abilities” are unlocked by scaling up language models. Emergence is one of the most exciting phenomena in large LMs…

Slides: docs.google.com/presentation/d… Throughout the past year, there have been hundreds of emergent abilities, which can only be observed in large-enough language models. I previously made a list of them (more than 100):

https://twitter.com/_jasonwei/status/1592210199460253696

Jun 12, 2022 • 7 tweets • 2 min read

Now promoting a consciousness piece I wrote before:

If language models can generate a "stream of consciousness" indistinguishable from humans, why aren't they conscious?

jasonwei20.github.io/files/artifici…

1. I first argue that language models like GPT-3 can generate a stream of thought similar to how cascade of thoughts that seem to arise in our minds.

Share this page!

Enter URL or ID to Unroll