Building AI that makes autonomous decisions using world models, artificial curiosity, and temporal abstraction @DeepMind
Apr 2 • 8 tweets • 3 min read
Excited to share that DreamerV3 has been published in Nature!
Dreamer solves control tasks by imagining the future outcomes of its actions inside of a continuously learned world model 🌏
It's the first agent to find diamonds in Minecraft from scratch without human data! 💎
👇
🪄 Robustness techniques remove the need for practitioners to tweak the algorithm, making it easy to apply to new tasks!
Dreamer outperforms a more specialized algorithms (and general ones like PPO) across a diverse range of benchmarks 🚀
Oct 25, 2022 • 9 tweets • 5 min read
Current RL algorithms still struggle under partial observability, which is common e.g. in real 3D environments. Excited to introduce the Memory Maze benchmark, carefully designed for evaluating long-term memory of RL algorithms! 🏠🤖🚀 @jurgisp@countzerozzz
Memory Maze includes two benchmarks: (1) An online RL benchmark where the agent is repeatedly tasked with finding the different objects in randomly generated mazes, and (2) An offline dataset and probing benchmark for evaluating state representations. 4 difficulty levels each
Jun 29, 2022 • 11 tweets • 5 min read
A dream come true! We introduce DayDreamer, where we apply world models for fast end-to-end learning on 4 physical robots, without simulators.
We learn quadruped walking from scratch in 1 hour. We also learn to pick & place balls directly from pixels and sparse rewards 🤖🌏👇
Deep reinforcement learning often needs too much trial and error to be practical on physical robots, which means one needs to train in simulation first. But simulators don't capture the complexity of the real world and the resulting policies don't adapt to changes in the world
Jun 10, 2022 • 10 tweets • 4 min read
Excited to share Director, a practical, general, and interpretable reinforcement learning algorithm for learning hierarchical behaviors from pixels!
Director explores and solves long-horizon tasks with very sparse rewards by breaking them down into internal subgoals.
Thread 👇
Solving long-horizon tasks is one of the few challenges in embodied AI that I think compute alone will not solve. Humans easily break down complex tasks into subgoals to achieve tasks through millions of muscle commands. Yet current methods struggle beyond a few hundred decisions
Feb 23, 2021 • 8 tweets • 4 min read
Excited to present Clockwork VAEs for video prediction!
Clockwork VAEs (CW-VAEs) leverage hierarchies of latent sequences, where higher levels tick slower. They learn long-term deps across 1000 frames, semantically separate content, and outperform strong video models.
👇 Thread
On 4 diverse datasets, Clockwork VAEs yield more accurate long-term predictions than strong baselines. VTA also uses temporal abstraction but its consecutive frames are inconsistent. RSSM and SVG make plausible predictions but ignore long deps. SVG sometimes falls off manifold.