I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library.
Generally capable, autonomous agents are the next frontier of AI. They continuously explore, plan, and develop new skills in open-ended worlds, driven by survival & curiosity.
Minecraft is by far the best testbed with endless possibilities for agents:
1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs; 2) A skill library of code to store & retrieve complex behaviors; 3) An automatic curriculum to maximize exploration.
First, Voyager attempts to write a program to achieve a particular goal, using a popular Javascript Minecraft API (Mineflayer). The program is likely incorrect at the first try. The game environment feedback and javascript execution error (if any) help GPT-4 refine the program.
Second, Voyager incrementally builds a skill library by storing the successful programs in a vector DB. Each program can be retrieved by the embedding of its docstring. Complex skills are synthesized by composing simpler skills, which compounds Voyager’s capabilities over time.
Third, an automatic curriculum proposes suitable exploration tasks based on the agent’s current skill level & world state, e.g. learn to harvest sand & cactus before iron if it finds itself in a desert rather than a forest.
Think of it as an in-context form of *novelty search*.
Putting these all together, here’s the full data flow design that drives lifelong learning in a vast 3D voxel world without any human intervention.
Let’s look at some experiments!
We evaluate Voyager systematically against other LLM-based agent techniques, such as ReAct, Reflexion, and the popular AutoGPT in Minecraft.
Voyager discovers 63 unique items within 160 prompting iterations, 3.3x more than the next best approach.
The novelty-seeking automatic curriculum naturally compels Voyager to travel extensively. Without being explicitly instructed to do so, Voyager traverses 2.3x longer distances and visits more terrains than the baselines, which are “lazier” and often stuck in local areas.
How good is the “trained model”, i.e. skill library after lifelong learning?
We clear the agent’s inventory/armors, spawn a new world, and test with unseen tasks. Voyager solves them significantly faster. Our skill library even boosts AutoGPT, since code is easily transferrable.
Voyager is currently text-only, but can be augmented by visual perception in the future. We do a preliminary study where humans act like an image captioning model and provide feedback to Voyager.
It is able to construct complex 3D structures, such as a Nether Portal and a house.
3 mo ago, I said windows will be the first AI-first OS. Surely, Microsoft delivers with a sharp vision and steady hand. To me, Windows Copilot is a way bigger deal than Bing Chat. It's becoming a full-fledged agent that takes *actions* on the OS & native software level, given… twitter.com/i/web/status/1…
Sorry, re-posted this because video failed to work in the previous post.
A nice evolution tree of the major LLM milestones. The authors also curated a GitHub "awesome list" of LLM practical guides. They cover not just models, but also pretraining/finetuning datasets and evaluation protocols.
If you insert electric probes into an insect *before* adulthood, its tissues can organically grow around the probe and unlock a high-bandwidth insect-machine interface.
Then you can read data from the insect's brain and *control* its flight by stimulation. This is from 2009, but… twitter.com/i/web/status/1…
Paper: Insect–Machine Interface Based Neurocybernetics.
Note: I'm just pointing out the technical feasibility of this. People thought about it in 2009, so it isn't a new idea. The technology is well within reach - IMHO < 5 years if someone works on this seriously.
Now whether we *should* do it is a completely different issue.
Don't watch AI FOMO and fear-mongering videos on YouTube. Watch the excellent talk from John Schulman @johnschulman2, creator of RLHF that powers GPT-4. Now *this* qualifies as "insane" ingenuity, if you ask me.
Finally happening: HuggingFace Transformers Agent. It enables a coding LLM to compose other HF models on the fly to solve multimodal tasks.
It's a step towards the Everything App, which grows in capability as the ecosystem grows.
I've been waiting for this since HuggingGPT: 🧵
HuggingGPT is the first demonstration of such an idea at scale. It uses GPT as a controller to dynamically pick tools (models) to solve a multi-stage task.
ChatGPT App Store is of course the first instantiation of such an AI tool ecosystem. OpenAI's next step is obviously a meta app that can use lots of plugins in a single session.