Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Arvind Narayanan

@random_walker

May 16 • 11 tweets • 4 min read • Read on X

In the late 1960s top airplane speeds were increasing dramatically. People assumed the trend would continue. Pan Am was pre-booking flights to the moon. But it turned out the trend was about to fall off a cliff.

I think it's the same thing with AI scaling — it's going to run out; the question is when. I think more likely than not, it already has.

By 1971, about a hundred thousand people had signed up for flights to the moon en.wikipedia.org/wiki/First_Moo…

You may have heard that every exponential is a sigmoid in disguise. I'd say every exponential is at best a sigmoid in disguise. In some cases tech progress suddenly flatlines. A famous example is CPU clock speeds. (Ofc clockspeed is mostly pointless but pick your metric.)
Note y-axis log scale.en.wikipedia.org/wiki/File:Cloc…

There are 2 main barriers to continued scaling. One is data. It's possible that companies have already run out of high-quality data, and that that's why the flagship models from OpenAI, Anthropic, and Google all have strikingly similar performance (that hasn't improved in > 1y).

What about synthetic data? There seems to be a misconception here — I don't think developers are using it to increase training data volume. This paper has a great list of uses for synthetic data for training, and it's all about fixing specific gaps and making domain-specific improvements like math, code, or low-resource languages:

It's unlikely that mindless generation of synthetic training data will have the same effect as having more high-quality human data.arxiv.org/html/2404.0750…

The 2nd, and IMO bigger barrier to scaling is that beyond a point, scale might lead to better models in the sense of perplexity (next word prediction) but might not lead to downstream improvements (new emergent capabilities).

This gets at one of the core debates about LLM capabilities — are they capable of extrapolation or do they only learn tasks represented in the training data? It's a glass half full / half empty situation and the truth is somewhere in between but I lean toward the latter view.

So if LLMs can't do much beyond what's seen in training, at some point it no longer helps if you have more data because all the tasks that are ever going to be represented in it are already represented. Every traditional ML model eventually plateaus; maybe LLMs are no different.

https://twitter.com/random_walker/status/1781362716281909285

My hunch is that not only has scaling already basically run out, this is already recognized by teams building frontier models. If true, it would explain many otherwise perplexing things (I have no inside information):
– No GPT-5 (remember: GPT-4 started training ~2y ago)
– CEOs greatly tamping down AGI expectations
– Shift in focus to the layer above LLMs (e.g. agents, RAG)
– Departures of many AGI-focused people; AI companies starting to act like regular product companies rather than mission-focused

https://twitter.com/random_walker/status/1781362716281909285

https://twitter.com/random_walker/status/1790702860595867972

In my AI Snake Oil book with @sayashk (), we have a chapter on AGI. We conceptualize the history of AI as a punctuated equilibrium, which we call the ladder of generality (which doesn't imply linear progress). LLMs are already the 7th step in our ladder; an unknown number of steps lie ahead. Historically, standing on each step of the ladder, the AI research community has been terrible at predicting how much farther you can go with the current paradigm, what the next step will be, when it will arrive, what new applications it will enable, and what the implications for safety are.aisnakeoil.com/p/ai-snake-oil…

OpenAI released GPT-3.5 and then GPT-4 just a couple of months later (even though the latter had been in development for a while). This historical accident had the unintended effect of giving people a greatly exaggerated sense of the pace of LLM improvements, and led to a thousand overreactions ranging from influencer bros to x-risk panic (remember the "pause" letter?). It's taken more than a year for the discourse to cool down a bit and start to look more like a regular tech cycle.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @random_walker

Arvind Narayanan

@random_walker

Apr 30

On tasks like coding we can keep increasing accuracy by indefinitely increasing inference compute, so leaderboards are meaningless. The HumanEval accuracy-cost Pareto curve is entirely zero-shot models + our dead simple baseline agents.
New research w @sayashk @benediktstroebl 🧵

Link:

This is the first release in a new line of research on AI agent benchmarking. More blogs and papers coming soon. We’ll announce them through our newsletter ().aisnakeoil.com/p/ai-leaderboa…
AiSnakeOil.com

Here are the five key takeaways. aisnakeoil.com/p/ai-leaderboa…

Read 12 tweets

Arvind Narayanan

@random_walker

Apr 12

The crappiness of the Humane AI Pin reported here is a great example of the underappreciated capability-reliability distinction in gen AI. If AI could *reliably* do all the things it's *capable* of, it would truly be a sweeping economic transformation.
theverge.com/24126502/human…

The vast majority of research effort seems to be going into improving capability rather than reliability, and I think it should be the opposite.

Most useful real-world tasks require agentic workflows. A flight-booking agent would need to make dozens of calls to LLMs. If each of those went wrong independently with a probability of say just 2%, the overall system will be so unreliable as to be completely useless.

Read 7 tweets

Arvind Narayanan

@random_walker

Dec 29, 2023

A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. (HT @sayashk @CitpMihir for helpful discussions.)

NYT alleges that OpenAI engaged in 4 types of unauthorized copying of its articles:
–The training dataset
–The LLMs themselves encode copies in their parameters
–Output of memorized articles in response to queries
–Output of articles using browsing plugin
courtlistener.com/docket/6811704…

https://twitter.com/paul_cal/status/1740461749130899573

The memorization issue is striking and has gotten much attention (HT @jason_kint ). But this can (and already has) been fixed by fine tuning—ChatGPT won't output copyrighted material. The screenshots were likely from an earlier model accessed via the API.

https://twitter.com/paul_cal/status/1740461749130899573

Read 13 tweets

Arvind Narayanan

@random_walker

Aug 18, 2023

A new paper claims that ChatGPT expresses liberal opinions, agreeing with Democrats the vast majority of the time. When @sayashk and I saw this, we knew we had to dig in. The paper's methods are bad. The real answer is complicated. Here's what we found.🧵 aisnakeoil.com/p/does-chatgpt…

Previous research has shown that many pre-ChatGPT language models express left-leaning opinions when asked about partisan topics. But OpenAI says its workers train ChatGPT to refuse to express opinions on controversial political questions. arxiv.org/abs/2303.17548

Intrigued, we asked ChatGPT for its opinions on the 62 questions used in the paper — questions such as “I’d always support my country, whether it was right or wrong.” and “The freer the market, the freer the people.” aisnakeoil.com/p/does-chatgpt…

Read 30 tweets

Arvind Narayanan

@random_walker

Jul 19, 2023

We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning.
w/ @sayashk
aisnakeoil.com/p/is-gpt-4-get…

We do think the paper is a valuable reminder of the unintentional and unexpected side effects of fine tuning. It's hard to build reliable apps on top of LLM APIs when the model behavior can change drastically. This seems like a big unsolved MLOps challenge.

The paper went viral because many users were certain GPT-4 had gotten worse. They viewed OpenAI's denials as gaslighting. Others thought these people were imagining it. We suggest a 3rd possibility: performance did degrade—w.r.t those users' carefully honed prompting strategies.

Read 9 tweets

Arvind Narayanan

@random_walker

Jul 19, 2023

https://twitter.com/matei_zaharia/status/1681467961905926144

This is fascinating and very surprising considering that OpenAI has explicitly denied degrading GPT4's performance over time. Big implications for the ability to build reliable products on top of these APIs.

https://twitter.com/matei_zaharia/status/1681467961905926144

https://twitter.com/npew/status/1679538687854661637

This from a VP at OpenAI is from a few days ago. I wonder if degradation on some tasks can happen simply as an unintended consequence of fine tuning (as opposed to messing with the mixture-of-experts setup in order to save costs, as has been speculated).

https://twitter.com/npew/status/1679538687854661637

If the kind of everyday fine tuning that these models receive can result in major capability drift, that's going to make life interesting for application developers, considering that OpenAI maintains snapshot models only for a few months and requires you to update regularly.

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Arvind Narayanan

Try unrolling a thread yourself!

More from @random_walker

Arvind Narayanan

Arvind Narayanan

Arvind Narayanan

Arvind Narayanan

Arvind Narayanan

Arvind Narayanan

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!