swyx Profile picture
Apr 19, 2023 16 tweets 10 min read Read on X
🧠 The Anatomy of Autonomy 🤖

The fifth killer app of AI is Autonomous Agents.

Presenting
- Summary of #AutoGPT / @babyAGI_
- The 5 stages of "brain" development it took to get from Foundation Models to Autonomous Agents
- Why Full Autonomy is like "Full Self Driving"!

Begin: Image
@babyAGI_ (this is the obligatory threadooor TLDR of my latest newsletter post, hop over if you like my long form work: )latent.space/p/agents
I think there have been 4 "Killer Apps" of AI so far.

"Killer App" as in:
- unquestionable PMF
- path to making >$100m/yr
- everybody holds it up as an example

They are:
1. Generative Text
2. Generative Art
3. Copilot for X
4. ChatGPT

We're seeing the birth of Killer App #5 Image
🤖What is AutoGPT and why are they "the next frontier of prompt engineering"?

Take the biggest open source AI projects you can think of. I don't care which.

AutoGPT **trounces** all of them. It's ~2 weeks old and it's not even close (see below).

And yet: AutoGPT isn't a new open source foundation model. Doesn't involve any deep ML innovation or understanding whatsoever. It is a pure prompt engineering win.

The key insight:

- applying existing LLM APIs (GPT3, 4, or others)
- and reasoning/tool prompt patterns (e.g. ReAct)
- in an infinite loop,
- to do indefinitely long-running, iterative work
- to accomplish a high level goal set by a human user

We really mean "high level" when we say "high level":

@SigGravitas' original AutoGPT demo was: “an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth”

@yoheinakajima's original prompt was an AI "to “start and grow a mobile AI startup”

Yes, that's it! You then lean on the AI's planning and self prompting, give it the tools it needs (eg. browser search, or writing code), to achieve its set goal by whatever means necessary. Mostly you can just hit "yes" to continue, or if you're feeling lucky/rich, you can run them in "continuous mode" and watch them blow through your @OpenAI budget.

The core difference between them is surprisingly simple:

@BabyAGI_ is intentionally smol. Initial MVP was <150 LOC, and its core loop is illustrated below.

#AutoGPT is very expansive and has what Liam Neeson would call a particular set of skills, from reasonable ones like Google Search and Browse Website, to Cloning Repos, Sending Tweets, Executing Code, and spawning other agents (!)
Fortunately the @OpenAI strategy of building in safety at the foundation model layer has mitigated the immediate threat of paperclips.

Even when blatantly asked to be a paperclip maximizer, BabyAGI refuses.

Incredibly common OpenAI Safety Team W.

The development of Autonomous AI started with the release of GPT3 just under 3 years ago.

In the beginning, there were Foundation Models. @Francis_YAO_ explains how they provide natural language understanding and generation, store world knowledge, and display in-context learning.

Then we learned to *really* prompt them to improve their reasoning capabilities with @_jasonwei's Chain of Thought and other methods.

Then we learned to add external memory, since you can't retrain models for every usecase or for the passage of time. @danshipper notes they are *Reasoning Engines*, not omniscient oracles.

Then we handed the AI a browser, and let it both read from the Internet as well as write to it. @sharifshameem and @natfriedman's early explorations were a precursor of many browser agents to come.

Then we handed more and more and more tools to the AI, and let it write its own code to fill in the tools it doesn't yet have. @goodside's version of this is my favorite: "You Are GPT-3, and You Cannot Do Math" - but giving it a @replit so it can write whatever python it needs to do math. Brilliant.

@johnvmcdonnell's vision of Action-driven LLMs are here.




What's the last capability needed for Autonomous AI?

Planning.

Look at the 4 agents at work inside of BabyAGI. There's one of them we've never really seen before.

We are asking the LLM to prioritize, reflect, and plan ahead - things that @SebastienBubeck's team (authors of the Sparks of AGI paper) specifically noted that even GPT-4 was bad at.

This is the new frontier, and the new race. People with the best planning models and prompts will be able to make the best agents. (and games!)

@hwchase17's recent LangChain Agents webinar (excellent summary here ) also highlighted the emerging need to orchestrate agents as they run into and communicate with each other.


Is all this just for fun? Or a serious opportunity?

I argue that it is. Civilization advances by extending the number of operations we can perform without thinking about them. By building automations, and autonomous agents, we are extending the reach of our will. Image
AI may appear further away than they seem in this funhouse mirror, though.

Self-Driving Cars have been perpetually "5 years away" for a decade. We're seeing that now with Autonomous Agents - 2023 AI Agents are like 2015 Self Driving Cars.

AutoGPT is more like "level 1 Autonomy" and needs a lot of help to do something slower than we'd take without their help anyway.

But still, the Level 5 future is clearly valuable.
excellent, short, and overlooked @mattrickard post about how humans convey information in natural language

i think everyone building agents will eventually have to come to terms with how they react to the different kinds of human feedback and this the first good model ive seen
Image
Image
That's a relatively uncontroversial prediction. One thing I neglected to address tho is "how does this give insight towards AGI?"

I avoid most AGI debates because of difficulty of definition, but if it wasn't obvious from my human brain analogy, I do think developing a good planning/priorities AI gets us very very far in AGI process.

We will probably need a different architecture than autoregressive generation to do this, but then again, we're *already making* a different architecture as we add things like memory and tools/browsers.

Assuming we solve this, I have a few related candidates for next frontiers:
- hypothesis forming
- symbolic, self pruning world model
- personality
- empathy and full theory of mind

(i touched on a few in )
Lol I just got done saying that LLMs cant do planning very well and so we are safe until GPT5 drops…

and then 1 week later Cornell kids come along and point out that you can just give LLMs a planning tool and it Just Works lmao 🤦‍♂️

never underestimate AI progress, holy hell
Image
@babyAGI_ whoa - i didnt realize but my visualization chart is now in the official BabyAGI readme! Image
@lilianweng as always comes in with the definitive survey:
very cool to say @sashaorloff use the 5 level autonomy framing to describe his product - more agent type companies should use it

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with swyx

swyx Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @swyx

Nov 6, 2023
Join @latentspacepod and @thursdai_pod live at DevDay!

Now:

spotted: “New Products Deep Dive” for 45 mins… I wonder what that will be twitter.com/i/spaces/1BRJj…

GPT4 Turbo is ~3x cheaper than GPT4!

1. OpenAI's longest ever Context length: 128k
2. Better JSON/function calling
3. Knowledge: built in RAG and April 2023 cutoff
4. Dalle3, GPT4-V, and TTS model all in API today!!!
4b. Whisper V3 open sourced (coming to API)
5. Customization: GPT3 16k, GPT4 finetuning, Custom Models services
6. Higher Rate Limits - 2x tokens per minute, request raises in account settings - plus: Copyright Shield!

"GPT4 Turbo is a smarter model than GPT4" (GPT4.5 confirmed!)


Image
Image
Image
@latentspacepod @thursdai_pod Friendship resumed with Satya senpai Image
Read 23 tweets
Oct 10, 2023
it’s official - I think GitHub Copilot is the first* generative AI product to publicly claim they’ve passed $100m ARR — enough to stand alone as a publicly listed company

Whenever people ask me “is AI a fad” the biggest thing I point to is “follow the money”:

- revenue, not just funding
- RECURRING, not tcosts on hype
- people publicly saying they’d pay 5x the cost

(*there’s likely a few others but none confirmed officially - see Anatomy of Autonomy post on @latentspacepod)
Image
next up is @DedyKredo LIVE CODING a full test suite, making code changes, and automating commit and PR review, all assisted by @CodiumAI . audible “what the fuck” from @eugeneyan.



ends with a powerful message for Israel. we stand with you @itamar_mar. youtube.com/live/qw4PrtyvJ…
@DedyKredo @CodiumAI @eugeneyan @itamar_mar i think @mdwelsh is an early contender for the “@Dharmesh Award for Most Laughs Per Minute” in his talk

youtube.com/live/qw4PrtyvJ…
Read 17 tweets
Jul 18, 2023
That was fast - Llama 2 is out!

and cleared for commercial use! and *destroys* Falcon 40B on @DanHendrycks's MMLU and other top benchmarks

They really meant it when they said "imminently" lol



Scheduled a @latentspacepod at 3pm PT - join @FanaHOVA and… https://t.co/iWFLYJLCJd https://t.co/C0YKJ8snjr https://t.co/TZvfRrz5lKtwitter.com/i/spaces/1nAKE…
twitter.com/i/web/status/1…



Image
Image
Image
@DanHendrycks @latentspacepod @FanaHOVA it seems @mascobot is on top of it - you can try out llama 2 here:

they also have a Llama playground but its not currently working for me https://t.co/cao0EUYWQSreplicate.com/a16z-infra/lla…
LLaMA 2 seems to know its own name...? what chat completions were given to it?

and these examples from @rajko_rad are great, thank you a16z (why did a16z get the hookup tho? so many questions haha)


Image
Image
Image
Image
Read 17 tweets
Jun 30, 2023
🆕 Essay: The Rise of the AI Engineer



Keeping up on AI is becoming a full time job.

Let's get together and define it. https://t.co/KD2lY9FTtmlatent.space/p/ai-engineer
Builders need a place to talk turpentine. This is why i'm teaming up with @benghamine to produce @aiDotEngineer, the definitive place to talk AI UX, devtools, infra, and all things AI Engineering.

500 seats.
SF/Virtual, Oct 8-10.

Join us!

Screenshot from a DM today. We're changing lives and encouraging people to get over their own mental hurdles to build with AI. I love this so much.

https://t.co/MZXlc9VQzu
Read 5 tweets
Jun 20, 2023
The @latentspacepod is excited to publish:

Petaflops to the People:
@realGeorgeHotz's first interview
on his new personal compute cluster company

the tiny corp.

latent.space/p/geohot

We discuss how tiny is taking on Nvidia, Google, and PyTorch with a tiny team and go deep… twitter.com/i/web/status/1…
@latentspacepod @realGeorgeHotz GPT4 is 8 x 220B params = 1.7 Trillion params



ok I wasn't sure how widely to spread the rumors on GPT-4 but it seems Soumith is also confirming the same so here's the quick clip!

so yes, GPT4 is technically 10x the size of GPT3, and all the small… twitter.com/i/web/status/1…
since MoE is So Hot Right Now, GLaM might be the paper to pay attention to. Google already has a 1.2T model with 64 experts, while Microsoft Bing’s modes are different mixes accordingly Image
Read 6 tweets
Jun 7, 2023
this is a trend I'm calling "Code is all you need"

Comparing Bard vs @OpenAI ChatGPT vs @AnthropicAI Claude on Google's own reasoning/math prompts shows the stark contrast once you make your model write and eval code to answer questions. Reminds me of @amasad and @goodside's… twitter.com/i/web/status/1… ImageImageImage
@OpenAI @AnthropicAI @amasad @goodside This is part of a broader trend of us slowly discovering the special place of code in language models:

1/ Code Improves LLMs
@Francis_YAO_ et al have repeatedly found that adding code in pretraining data improves LLMs in all benchmarks ( )

2/ Code LLMs… twitter.com/i/web/status/1…
@OpenAI @AnthropicAI @amasad @goodside @Francis_YAO_ Implication 4 is the new meta:

LLMs Making Their Own Tools

In retrospect this was obvious; many folks have defined tool using/tool making as a marker of intelligence between human and animal.

We've also seen ChatGPT Code Interpreter dynamically generate and execute code per… twitter.com/i/web/status/1… Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(