Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

swyx

@swyx

Apr 19, 2023 • 16 tweets • 10 min read • Read on X

Scrolly

🧠 The Anatomy of Autonomy 🤖

The fifth killer app of AI is Autonomous Agents.

Presenting
- Summary of #AutoGPT / @babyAGI_
- The 5 stages of "brain" development it took to get from Foundation Models to Autonomous Agents
- Why Full Autonomy is like "Full Self Driving"!

Begin:

@babyAGI_ (this is the obligatory threadooor TLDR of my latest newsletter post, hop over if you like my long form work: )latent.space/p/agents

I think there have been 4 "Killer Apps" of AI so far.

"Killer App" as in:
- unquestionable PMF
- path to making >$100m/yr
- everybody holds it up as an example

They are:
1. Generative Text
2. Generative Art
3. Copilot for X
4. ChatGPT

We're seeing the birth of Killer App #5

🤖What is AutoGPT and why are they "the next frontier of prompt engineering"?

Take the biggest open source AI projects you can think of. I don't care which.

AutoGPT **trounces** all of them. It's ~2 weeks old and it's not even close (see below).

And yet: AutoGPT isn't a new open source foundation model. Doesn't involve any deep ML innovation or understanding whatsoever. It is a pure prompt engineering win.

The key insight:

- applying existing LLM APIs (GPT3, 4, or others)
- and reasoning/tool prompt patterns (e.g. ReAct)
- in an infinite loop,
- to do indefinitely long-running, iterative work
- to accomplish a high level goal set by a human user

We really mean "high level" when we say "high level":

@SigGravitas' original AutoGPT demo was: “an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth”

@yoheinakajima's original prompt was an AI "to “start and grow a mobile AI startup”

Yes, that's it! You then lean on the AI's planning and self prompting, give it the tools it needs (eg. browser search, or writing code), to achieve its set goal by whatever means necessary. Mostly you can just hit "yes" to continue, or if you're feeling lucky/rich, you can run them in "continuous mode" and watch them blow through your @OpenAI budget.

The core difference between them is surprisingly simple:

@BabyAGI_ is intentionally smol. Initial MVP was <150 LOC, and its core loop is illustrated below.

#AutoGPT is very expansive and has what Liam Neeson would call a particular set of skills, from reasonable ones like Google Search and Browse Website, to Cloning Repos, Sending Tweets, Executing Code, and spawning other agents (!)

https://twitter.com/ESYudkowsky/status/1640511156254289926

Fortunately the @OpenAI strategy of building in safety at the foundation model layer has mitigated the immediate threat of paperclips.

Even when blatantly asked to be a paperclip maximizer, BabyAGI refuses.

Incredibly common OpenAI Safety Team W.

https://twitter.com/ESYudkowsky/status/1640511156254289926

The development of Autonomous AI started with the release of GPT3 just under 3 years ago.

In the beginning, there were Foundation Models. @Francis_YAO_ explains how they provide natural language understanding and generation, store world knowledge, and display in-context learning.

Then we learned to *really* prompt them to improve their reasoning capabilities with @_jasonwei's Chain of Thought and other methods.

Then we learned to add external memory, since you can't retrain models for every usecase or for the passage of time. @danshipper notes they are *Reasoning Engines*, not omniscient oracles.

Then we handed the AI a browser, and let it both read from the Internet as well as write to it. @sharifshameem and @natfriedman's early explorations were a precursor of many browser agents to come.

Then we handed more and more and more tools to the AI, and let it write its own code to fill in the tools it doesn't yet have. @goodside's version of this is my favorite: "You Are GPT-3, and You Cannot Do Math" - but giving it a @replit so it can write whatever python it needs to do math. Brilliant.

@johnvmcdonnell's vision of Action-driven LLMs are here.

What's the last capability needed for Autonomous AI?

Planning.

Look at the 4 agents at work inside of BabyAGI. There's one of them we've never really seen before.

We are asking the LLM to prioritize, reflect, and plan ahead - things that @SebastienBubeck's team (authors of the Sparks of AGI paper) specifically noted that even GPT-4 was bad at.

This is the new frontier, and the new race. People with the best planning models and prompts will be able to make the best agents. (and games!)

@hwchase17's recent LangChain Agents webinar (excellent summary here ) also highlighted the emerging need to orchestrate agents as they run into and communicate with each other.

Is all this just for fun? Or a serious opportunity?

I argue that it is. Civilization advances by extending the number of operations we can perform without thinking about them. By building automations, and autonomous agents, we are extending the reach of our will.

AI may appear further away than they seem in this funhouse mirror, though.

Self-Driving Cars have been perpetually "5 years away" for a decade. We're seeing that now with Autonomous Agents - 2023 AI Agents are like 2015 Self Driving Cars.

AutoGPT is more like "level 1 Autonomy" and needs a lot of help to do something slower than we'd take without their help anyway.

But still, the Level 5 future is clearly valuable.

excellent, short, and overlooked @mattrickard post about how humans convey information in natural language

i think everyone building agents will eventually have to come to terms with how they react to the different kinds of human feedback and this the first good model ive seen

https://twitter.com/blurb_istheword/status/1648557177634684928?s=20

That's a relatively uncontroversial prediction. One thing I neglected to address tho is "how does this give insight towards AGI?"

I avoid most AGI debates because of difficulty of definition, but if it wasn't obvious from my human brain analogy, I do think developing a good planning/priorities AI gets us very very far in AGI process.

We will probably need a different architecture than autoregressive generation to do this, but then again, we're *already making* a different architecture as we add things like memory and tools/browsers.

Assuming we solve this, I have a few related candidates for next frontiers:
- hypothesis forming
- symbolic, self pruning world model
- personality
- empathy and full theory of mind

(i touched on a few in )

https://twitter.com/blurb_istheword/status/1648557177634684928?s=20

https://twitter.com/mathemagic1an/status/1651724212971929600

Lol I just got done saying that LLMs cant do planning very well and so we are safe until GPT5 drops…

and then 1 week later Cornell kids come along and point out that you can just give LLMs a planning tool and it Just Works lmao 🤦‍♂️

never underestimate AI progress, holy hell

https://twitter.com/mathemagic1an/status/1651724212971929600

@babyAGI_ whoa - i didnt realize but my visualization chart is now in the official BabyAGI readme!

https://twitter.com/lilianweng/status/1673535600690102273

@lilianweng as always comes in with the definitive survey:

https://twitter.com/lilianweng/status/1673535600690102273

https://twitter.com/sashaorloff/status/1686807083243782144?s=20

very cool to say @sashaorloff use the 5 level autonomy framing to describe his product - more agent type companies should use it

https://twitter.com/sashaorloff/status/1686807083243782144?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @swyx

swyx

@swyx

Apr 14

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:

- telling the model to be persistent (+20%)
- dont self-inject/parse toolcalls (+2%)
- prompted planning (+4%)
- JSON BAD - use XML or arxiv 2406.13121 (GDM format)
- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD
- no evidence that ALL CAPS or Bribes or Tips or threats to grandma work

cookbook.openai.com/examples/gpt4-…

we're publishing our GPT 4.1 chat with @michpokrass later today pls like n sub youtube.com/@latentspacepod

Read 7 tweets

swyx

@swyx

Mar 30

.swyxrules for working on long running/expensive code:

Unbundle run() into:

- init()
- continue(id)
- cleanup(id)

never assume you will call these in order

Always checkpoint and resume from id's.

Pass nothing else.

This forces you to keep things serializable and therefore loggable, reproducible, parallelizable.

by the way try not to name them 'id' if you can add extra detail like 'runId', 'taskId', 'subTask2Id'.

Read 5 tweets

swyx

@swyx

Jan 12

was fun to work on this @latentspacepod post w/ @benhylak

inspired by "pivot token" literature, one gpt->o1 mental model shift i've made is the role of self-evaluation and -correction as an ESSENTIAL part of planning/reasoning.

with o1, you move the LLM-as-judge *INTO THE PROMPT*, so you can let it handle the self eval and replanning. this is the incremental next "agentic" step, which openai consistently does well, to the frustration of more hyperbolic but snake oily alternatives.

@latentspacepod @benhylak full context latent.space/p/o1-skill-iss…

and here

x.com/benhylak/statu…

https://twitter.com/IntuitMachine/status/1879292531273572795

@latentspacepod @benhylak shamelessly ripped off lmao

https://twitter.com/IntuitMachine/status/1879292531273572795

Read 5 tweets

swyx

@swyx

Dec 14, 2024

this neurips is really going to be remembered as the "end of pretraining" neurips

notes from doctor @polynoamial's talk on scaling test time compute today

(thank you @oh_that_hat for organizing)

all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe

however inference costs have scaled much less.

goes back to libratus/pluribus work

poker model scaling from 2012-2015 - scaled 5x each year, but still lost dramatically (9 big bets per hundred) to poker pros in 80k hands

recalls familiar insight about humans taking longer to think for harder problems.

added 20s of search - reduced distance from nash equilibrium results reduced by a factor of 7 - roughly the equivalent of scaling up model size by 100,000x

Read 22 tweets

swyx

@swyx

Oct 1, 2024

https://twitter.com/romainhuet/status/1841161854217044235

Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day

first observation: @sama MIA

GPT5 still mentioned and on the table

https://twitter.com/romainhuet/status/1841161854217044235

after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.

ai controlled drones, what could go wrong?

@romainhuet Realtime API announced!

starting with speech to speech support
all 6 adv voice mode voices supported

demo next

Read 43 tweets

swyx

@swyx

Sep 30, 2024

just realized NotebookLM is @GoogleDeepMind's ChatGPT moment

- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features

in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)

https://x.com/latentspacepod/status/1840809425994727630

@GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.

do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM

https://x.com/latentspacepod/status/1840809425994727630

@GoogleDeepMind - just hired @OfficialLoganK

if i had a penny for every time this has happened i'd have two pennies

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

swyx

Try unrolling a thread yourself!

More from @swyx

swyx

swyx

swyx

swyx

swyx

swyx

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!