Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Jason Wei

@_jasonwei

May 29 • 1 tweets • 2 min read • Read on X

A recent clarity that I gained is viewing AI research as a “max-performance domain”, which means that you can be world-class by being very good at only one part of your job. As long as you can create seminal impact (e.g., train the best model, start a new paradigm, or create widely adopted benchmarks), it doesn’t matter if you’re incompetent at adjacent skills. For example, I have seen game-changing AI researchers have horrendous presentation skills, terrible political awareness, and who never think about their career progression. Heck, I even know a top AI researcher who probably wouldn’t pass a basic coding interview. But it doesn’t matter. Exceptional ability at a single thing outweighs incompetence at other parts of the job.

In max-performance domains, you don’t even need to be good at your one thing in a consistent way. An AI researcher can have tens of failed projects per year and still be successful if they produce a seminal work every few years. The metric is the best five works in your career, not the average.

A dangerous thing in max-performance domains is placing too much emphasis on role models. That’s because you don’t know whether you’re mimicking the good characteristics or not. For example, a top AI researcher can make a bad political move that turns out OK for them because of who they are. Or they can make a bold, unsubstantiated statement and expect other people to listen. But if anyone else had done the same thing, the outcome would be opposite.

Another way to view max-performance domains is that they have exponential upside and very little downside. That’s why interviews are especially useless in domains like AI research, because they tend to severely punish mistakes and don’t capture exponential value. An RL expert doesn’t need to know how SVMs work and probably hasn’t thought about it in years. A top AI infra engineer might lack basic knowledge about post-training data practices.

In my view it’s a luxury to work in a max-performance domain. Failure is allowed and stress is usually self-imposed. A thousand years ago, very few humans worked in max-performance domains, but now the opportunity is more available. Technology may have played a role in this shift, and with the progression of AI, hopefully more of humanity can move into max-performance domains.

(If you're wondering about what an example of a non-max-performance domain would be, it's any career where you must have both strengths and also basically no weaknesses. For example, a defender in soccer might cost their team the entire game with a single mistake. A piano player must master all parts of their concerto well, not just a single part.)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_jasonwei

Jason Wei

@_jasonwei

May 3, 2023

Since GPT-4, some have argued that emergence in LLMs is overstated, or even a "mirage". I don't think these arguments debunk emergence, but they warrant discussion (it's generally good to examine scientific phenomena critically).

A blog post: jasonwei.net/blog/common-ar…

🧵⬇️

Argument 1: Emergence occurs for “hard” evaluation metrics like exact match or multiple-choice, and if you use metrics that award partial credit, then performance improves smoothly (arxiv.org/abs/2304.15004).

Response 1A: Sure you can find some metric that improves smoothly, but if the metric that improves in an emergent fashion is the one we ultimately care about, then that is what matters.

Read 12 tweets

Jason Wei

@_jasonwei

Mar 16, 2023

I’m hearing chatter of PhD students not knowing what to work on.
My take: as LLMs are deployed IRL, the importance of studying how to use them will increase.
Some good directions IMO (no training):
1. prompting
2. evals
3. LM interfaces
4. safety
5. understanding LMs
6. emergence

1. Prompting research. Maybe hot take, but I think we’ve just reached the tip of the iceberg on the best ways to prompt language models. As language model capabilities increase, the degrees of freedom for guiding a particular generation via a good prompt will increase.

2. Building evaluations. Many benchmarks get quickly saturated, and we need more to evaluate the frontier of language models. In addition, it’s still an open question of how to evaluate language models generally. The new OpenAI evals library could be good: github.com/openai/evals

Read 9 tweets

Jason Wei

@_jasonwei

Mar 13, 2023

Hot take supported by evidence: for a given NLP task, it is unwise to extrapolate performance to larger models because emergence can occur.

I manually examined all 202 tasks in BIG-Bench, and the most common category was for the scaling behavior to *unpredictably* increase.

So the idea that emergent/unpredictable scaling behavior is "cherrypicked" is simply untrue.

However, it is true that loss on a broad test set or aggregate performance on BIG-Bench can improve predictably. But for a single downstream task this is simply not the case.

https://twitter.com/_jasonwei/status/1592210199460253696

For a list of the 67 tasks in BIG-Bench that are emergent, see

https://twitter.com/_jasonwei/status/1592210199460253696

Read 5 tweets

Jason Wei

@_jasonwei

Feb 9, 2023

https://twitter.com/percyliang/status/1622391583109939202

Studying emergent abilities of language models can seem elusive for researchers without have access to Google/DeepMind models.

A 🧵 with some unexplored ideas to study emergence using (1) the free codex API, (2) flan-t5, or (3) big-bench paper analysis.

https://twitter.com/percyliang/status/1622391583109939202

(1) Many don't know, but the code-* API is free, and you can run three sizes of models: curie-001, davinci-001, and davinci-002. Davinci-002 is comparable with PaLM.

To get more model scales, small models such as text-ada-001 or ada-curie can be evaluated for relatively cheap.

(1 cont.) It's possible to write entire papers just using the codex API for free, as many people have.

Read 8 tweets

Jason Wei

@_jasonwei

Jan 25, 2023

@Stanford

Yesterday I gave a lecture at @Stanford's CS25 class on Transformers!

The lecture was on how “emergent abilities” are unlocked by scaling up language models. Emergence is one of the most exciting phenomena in large LMs…

Slides: docs.google.com/presentation/d…

https://twitter.com/_jasonwei/status/1592210199460253696

Throughout the past year, there have been hundreds of emergent abilities, which can only be observed in large-enough language models. I previously made a list of them (more than 100):

https://twitter.com/_jasonwei/status/1592210199460253696

One of the most interesting emergent abilities IMO is instruction tuning.

Anthropic and Flan-LaMDA suggest that zero-shot performance can improve from RLHF and NLP benchmark instruction tuning (although text-davinci usually loses to code-davinci).

arxiv.org/abs/2204.05862

Read 6 tweets

Jason Wei

@_jasonwei

Jun 12, 2022

Now promoting a consciousness piece I wrote before:

If language models can generate a "stream of consciousness" indistinguishable from humans, why aren't they conscious?

jasonwei20.github.io/files/artifici…

1. I first argue that language models like GPT-3 can generate a stream of thought similar to how cascade of thoughts that seem to arise in our minds.

2. This "artificial stream of thought" meets the “what it is like” definition of phenomenal consciousness, which states that something is consciousness if it is "like something" to be it.

We know what it is like to be GPT-3, just read its stream of consciousness!

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Jason Wei

Try unrolling a thread yourself!

More from @_jasonwei

Jason Wei

Jason Wei

Jason Wei

Jason Wei

Jason Wei

Jason Wei

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!