Jim Fan Profile picture
May 26 12 tweets 7 min read Twitter logo Read on Twitter
What if we set GPT-4 free in Minecraft? ⛏️

I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library.

GPT-4 unlocks… twitter.com/i/web/status/1…
Generally capable, autonomous agents are the next frontier of AI. They continuously explore, plan, and develop new skills in open-ended worlds, driven by survival & curiosity.

Minecraft is by far the best testbed with endless possibilities for agents:

Voyager has 3 key components:

1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs;
2) A skill library of code to store & retrieve complex behaviors;
3) An automatic curriculum to maximize exploration.
First, Voyager attempts to write a program to achieve a particular goal, using a popular Javascript Minecraft API (Mineflayer). The program is likely incorrect at the first try. The game environment feedback and javascript execution error (if any) help GPT-4 refine the program. Image
Second, Voyager incrementally builds a skill library by storing the successful programs in a vector DB. Each program can be retrieved by the embedding of its docstring. Complex skills are synthesized by composing simpler skills, which compounds Voyager’s capabilities over time. Image
Third, an automatic curriculum proposes suitable exploration tasks based on the agent’s current skill level & world state, e.g. learn to harvest sand & cactus before iron if it finds itself in a desert rather than a forest.

Think of it as an in-context form of *novelty search*. Image
Putting these all together, here’s the full data flow design that drives lifelong learning in a vast 3D voxel world without any human intervention. Image
Let’s look at some experiments!

We evaluate Voyager systematically against other LLM-based agent techniques, such as ReAct, Reflexion, and the popular AutoGPT in Minecraft.

Voyager discovers 63 unique items within 160 prompting iterations, 3.3x more than the next best approach. Image
The novelty-seeking automatic curriculum naturally compels Voyager to travel extensively. Without being explicitly instructed to do so, Voyager traverses 2.3x longer distances and visits more terrains than the baselines, which are “lazier” and often stuck in local areas. Image
How good is the “trained model”, i.e. skill library after lifelong learning?

We clear the agent’s inventory/armors, spawn a new world, and test with unseen tasks. Voyager solves them significantly faster. Our skill library even boosts AutoGPT, since code is easily transferrable. Image
Voyager is currently text-only, but can be augmented by visual perception in the future. We do a preliminary study where humans act like an image captioning model and provide feedback to Voyager.

It is able to construct complex 3D structures, such as a Nether Portal and a house.
Let agents emerge in Minecraft! All open-source: voyager.minedojo.org

This work is co-authored by my team at NVIDIA: @guanzhi_wang (our awesome intern), @yuqi_xie5, @YunfanJiang, @AjayMandlekar, @ChaoweiX, @yukez, @DrJimFan (myself, co-advisor), @AnimaAnandkumar (co-advisor). Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jim Fan

Jim Fan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DrJimFan

May 25
$NVDA will not stop at selling picks & shovels for the LLM gold rush. Foundation Models as a Service is coming.

I have the great fortune to play a part in NVIDIA Research, which produces too many top-notch AI works to count. Some examples: 🧵 Image
NVIDIA AI Foundation, a new initiative that Jensen announced in March:

- LLM customized to enterprise proprietary data.
- Multimodal generative models
- Biology LLM!

NVIDIA Picasso, part of the AI Foundation.

Text to image, 3D, videos as a service.

nvidia.com/en-us/gpu-clou…
Read 11 tweets
May 23
3 mo ago, I said windows will be the first AI-first OS. Surely, Microsoft delivers with a sharp vision and steady hand. To me, Windows Copilot is a way bigger deal than Bing Chat. It's becoming a full-fledged agent that takes *actions* on the OS & native software level, given… twitter.com/i/web/status/1…
Sorry, re-posted this because video failed to work in the previous post.

Here's Microsoft's official launch blog: blogs.windows.com/windowsdevelop…
Read 4 tweets
May 22
Curating high-quality posts from AI Twitter with my own take, Vol. 2

No breaking news, no productivity hack, no insane moments. Just some solid stuff that makes AI a bit better than last week.

Time for Chef's pick. Here we go: Image
MEGABYTE from Meta AI, a multi-resolution Transformer that operates directly on raw bytes. This signals the beginning of the end of tokenization.

Why is tokenization undesirable? @karpathy explains it best:

A nice evolution tree of the major LLM milestones. The authors also curated a GitHub "awesome list" of LLM practical guides. They cover not just models, but also pretraining/finetuning datasets and evaluation protocols.

Handy for practitioners.

github.com/Mooler0410/LLM… Image
Read 11 tweets
May 21
If you insert electric probes into an insect *before* adulthood, its tissues can organically grow around the probe and unlock a high-bandwidth insect-machine interface.

Then you can read data from the insect's brain and *control* its flight by stimulation. This is from 2009, but… twitter.com/i/web/status/1… ImageImage
Paper: Insect–Machine Interface Based Neurocybernetics.

Link: ibionics.ece.ncsu.edu/assets/Publica… Image
Note: I'm just pointing out the technical feasibility of this. People thought about it in 2009, so it isn't a new idea. The technology is well within reach - IMHO < 5 years if someone works on this seriously.

Now whether we *should* do it is a completely different issue.
Read 4 tweets
May 12
AI Twitter is flooded with low-quality stuff recently. No, GPT is not “dethroned”. And thin wrapper apps are not “insane”. At all.

I feel obligated to surface some quality posts I bookmarked. Every one of them should've been promoted 10x, but ¯\_(ツ)_/¯

In no particular order: Image
If you only have 1 seat to follow in AI Twitter, don't give that seat to me. Give it to @karpathy.

Andrej has the best take, by far, on the landscape of the open-source LLM ecosystem.

1/
Don't watch AI FOMO and fear-mongering videos on YouTube. Watch the excellent talk from John Schulman @johnschulman2, creator of RLHF that powers GPT-4. Now *this* qualifies as "insane" ingenuity, if you ask me.

Link:

2/
Read 12 tweets
May 10
Finally happening: HuggingFace Transformers Agent. It enables a coding LLM to compose other HF models on the fly to solve multimodal tasks.

It's a step towards the Everything App, which grows in capability as the ecosystem grows.

I've been waiting for this since HuggingGPT: 🧵 Image
HuggingGPT is the first demonstration of such an idea at scale. It uses GPT as a controller to dynamically pick tools (models) to solve a multi-stage task.

2/

ChatGPT App Store is of course the first instantiation of such an AI tool ecosystem. OpenAI's next step is obviously a meta app that can use lots of plugins in a single session.

3/

Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(