Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Jim Fan

@DrJimFan

May 26 • 12 tweets • 7 min read Twitter logo

Read on Twitter

Scrolly

twitter.com/i/web/status/1…

What if we set GPT-4 free in Minecraft? ⛏️

I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library.

GPT-4 unlocks… twitter.com/i/web/status/1…

https://twitter.com/DrJimFan/status/1595459499732926464?s=20

Generally capable, autonomous agents are the next frontier of AI. They continuously explore, plan, and develop new skills in open-ended worlds, driven by survival & curiosity.

Minecraft is by far the best testbed with endless possibilities for agents:

https://twitter.com/DrJimFan/status/1595459499732926464?s=20

Voyager has 3 key components:

1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs;
2) A skill library of code to store & retrieve complex behaviors;
3) An automatic curriculum to maximize exploration.

First, Voyager attempts to write a program to achieve a particular goal, using a popular Javascript Minecraft API (Mineflayer). The program is likely incorrect at the first try. The game environment feedback and javascript execution error (if any) help GPT-4 refine the program.

Second, Voyager incrementally builds a skill library by storing the successful programs in a vector DB. Each program can be retrieved by the embedding of its docstring. Complex skills are synthesized by composing simpler skills, which compounds Voyager’s capabilities over time.

Third, an automatic curriculum proposes suitable exploration tasks based on the agent’s current skill level & world state, e.g. learn to harvest sand & cactus before iron if it finds itself in a desert rather than a forest.

Think of it as an in-context form of *novelty search*.

Putting these all together, here’s the full data flow design that drives lifelong learning in a vast 3D voxel world without any human intervention.

Let’s look at some experiments!

We evaluate Voyager systematically against other LLM-based agent techniques, such as ReAct, Reflexion, and the popular AutoGPT in Minecraft.

Voyager discovers 63 unique items within 160 prompting iterations, 3.3x more than the next best approach.

The novelty-seeking automatic curriculum naturally compels Voyager to travel extensively. Without being explicitly instructed to do so, Voyager traverses 2.3x longer distances and visits more terrains than the baselines, which are “lazier” and often stuck in local areas.

How good is the “trained model”, i.e. skill library after lifelong learning?

We clear the agent’s inventory/armors, spawn a new world, and test with unseen tasks. Voyager solves them significantly faster. Our skill library even boosts AutoGPT, since code is easily transferrable.

Voyager is currently text-only, but can be augmented by visual perception in the future. We do a preliminary study where humans act like an image captioning model and provide feedback to Voyager.

It is able to construct complex 3D structures, such as a Nether Portal and a house.

@guanzhi_wang

Let agents emerge in Minecraft! All open-source: voyager.minedojo.org

This work is co-authored by my team at NVIDIA: @guanzhi_wang (our awesome intern), @yuqi_xie5, @YunfanJiang, @AjayMandlekar, @ChaoweiX, @yukez, @DrJimFan (myself, co-advisor), @AnimaAnandkumar (co-advisor).

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DrJimFan

Jim Fan

@DrJimFan

May 25

$NVDA will not stop at selling picks & shovels for the LLM gold rush. Foundation Models as a Service is coming.

I have the great fortune to play a part in NVIDIA Research, which produces too many top-notch AI works to count. Some examples: 🧵

https://twitter.com/DrJimFan/status/1638211601944948736?s=20

NVIDIA AI Foundation, a new initiative that Jensen announced in March:

- LLM customized to enterprise proprietary data.
- Multimodal generative models
- Biology LLM!

https://twitter.com/DrJimFan/status/1638211601944948736?s=20

NVIDIA Picasso, part of the AI Foundation.

Text to image, 3D, videos as a service.

nvidia.com/en-us/gpu-clou…

Read 11 tweets

Jim Fan

@DrJimFan

May 23

twitter.com/i/web/status/1…

3 mo ago, I said windows will be the first AI-first OS. Surely, Microsoft delivers with a sharp vision and steady hand. To me, Windows Copilot is a way bigger deal than Bing Chat. It's becoming a full-fledged agent that takes *actions* on the OS & native software level, given… twitter.com/i/web/status/1…

Sorry, re-posted this because video failed to work in the previous post.

Here's Microsoft's official launch blog: blogs.windows.com/windowsdevelop…

https://twitter.com/DrJimFan/status/1630737783810846720?s=20

My comment 3 months ago:

https://twitter.com/DrJimFan/status/1630737783810846720?s=20

Read 4 tweets

Jim Fan

@DrJimFan

May 22

Curating high-quality posts from AI Twitter with my own take, Vol. 2

No breaking news, no productivity hack, no insane moments. Just some solid stuff that makes AI a bit better than last week.

Time for Chef's pick. Here we go:

@karpathy

MEGABYTE from Meta AI, a multi-resolution Transformer that operates directly on raw bytes. This signals the beginning of the end of tokenization.

Why is tokenization undesirable? @karpathy explains it best:

https://twitter.com/karpathy/status/1657949234535211009?s=20

A nice evolution tree of the major LLM milestones. The authors also curated a GitHub "awesome list" of LLM practical guides. They cover not just models, but also pretraining/finetuning datasets and evaluation protocols.

Handy for practitioners.

github.com/Mooler0410/LLM…

Read 11 tweets

Jim Fan

@DrJimFan

May 21

twitter.com/i/web/status/1…

If you insert electric probes into an insect *before* adulthood, its tissues can organically grow around the probe and unlock a high-bandwidth insect-machine interface.

Then you can read data from the insect's brain and *control* its flight by stimulation. This is from 2009, but… twitter.com/i/web/status/1…

Paper: Insect–Machine Interface Based Neurocybernetics.

Link: ibionics.ece.ncsu.edu/assets/Publica…

Note: I'm just pointing out the technical feasibility of this. People thought about it in 2009, so it isn't a new idea. The technology is well within reach - IMHO < 5 years if someone works on this seriously.

Now whether we *should* do it is a completely different issue.

Read 4 tweets

Jim Fan

@DrJimFan

May 12

AI Twitter is flooded with low-quality stuff recently. No, GPT is not “dethroned”. And thin wrapper apps are not “insane”. At all.

I feel obligated to surface some quality posts I bookmarked. Every one of them should've been promoted 10x, but ¯\_(ツ)_/¯

In no particular order:

@karpathy

If you only have 1 seat to follow in AI Twitter, don't give that seat to me. Give it to @karpathy.

Andrej has the best take, by far, on the landscape of the open-source LLM ecosystem.

1/

https://twitter.com/karpathy/status/1654892810590650376?s=20

@johnschulman2

Don't watch AI FOMO and fear-mongering videos on YouTube. Watch the excellent talk from John Schulman @johnschulman2, creator of RLHF that powers GPT-4. Now *this* qualifies as "insane" ingenuity, if you ask me.

Link:

2/

https://twitter.com/DrJimFan/status/1649458857343864833?s=20

Read 12 tweets

Jim Fan

@DrJimFan

May 10

Finally happening: HuggingFace Transformers Agent. It enables a coding LLM to compose other HF models on the fly to solve multimodal tasks.

It's a step towards the Everything App, which grows in capability as the ecosystem grows.

I've been waiting for this since HuggingGPT: 🧵

https://twitter.com/DrJimFan/status/1642563455298473986?s=20

HuggingGPT is the first demonstration of such an idea at scale. It uses GPT as a controller to dynamically pick tools (models) to solve a multi-stage task.

2/

https://twitter.com/DrJimFan/status/1642563455298473986?s=20

https://twitter.com/DrJimFan/status/1639309454415106048?s=20

ChatGPT App Store is of course the first instantiation of such an AI tool ecosystem. OpenAI's next step is obviously a meta app that can use lots of plugins in a single session.

3/

https://twitter.com/DrJimFan/status/1639309454415106048?s=20

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Jim Fan

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @DrJimFan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Jim Fan

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!