Reinforcement learning typically trains & tests agents on the same game. New work shows how our team trains generally capable agents on huge game spaces, resulting in agents that generalise to held-out test games, & learn behaviours like experimentation dpmd.ai/open-ended-blog 1/
Rather than training on a limited number of tasks, our team defines a whole universe of tasks that can be procedurally generated, from simple object finding games to complex strategic games like Capture the Flag. 2/
By constructing a hierarchical learning process with an open-ended and iteratively refined objective, it was possible to train agents that never stop learning, and develop increasingly general behaviour across games. 3/
Our team finds these agents are able to generalise to many hand-authored probe tasks, and can solve out-of-distribution probe tasks through experimentation and tool use. 4/
Paper and results video below:
dpmd.ai/open-ended-pap…
dpmd.ai/open-ended-vid…

By the Open-Ended Learning Team including @maxjaderberg and @wojczarnecki.

Interested in this work? A research scientist role on the team is currently open, learn more: dpmd.ai/OE_RS 5/5

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with DeepMind

DeepMind Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DeepMind

26 Jul
Mixed Integer Programming is an NP-hard optimisation problem arising in planning, logistics, resource allocation, etc.

Presenting a solver with neural heuristics that learns to adapt to the problem domain, outperforming SCIP on Google-scale MIPs: dpmd.ai/13349 (1/) Image
Practical applications often focus on finding good solutions fast rather than proving optimality. In follow-up work, Neural Neighborhood Selection finds better solutions even faster by learning heuristics for large neighborhood search: dpmd.ai/10201 (2/) Image
The neural solver learns even on single problem instances, improving the best known solutions to three open MIPLIB problems.

Milo-v12-6-r1-75-1: dpmd.ai/milo-v12-6-r1-…
Neos-1420790: dpmd.ai/neos-1420790
xmas10-2: dpmd.ai/xmas10-2

(3/) Image
Read 4 tweets
23 Jul
Yesterday we announced early collaborations using the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome to date. So how is AlphaFold helping these organisations with their work…? 1/ Image
The Drugs for Neglected Diseases initiative (@DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world. 2/ Image
The @CEI_UoP is using #AlphaFold's predictions to help engineer faster enzymes for recycling some of our most polluting single-use plastics. 3/ Image
Read 4 tweets
22 Jul
Today with @emblebi, we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: dpmd.ai/alphafolddb 1/
We’re also sharing the proteomes of 20 other biologically-significant organisms, totalling over 350k structures. Soon we plan to expand to over 100 million, covering almost every sequenced protein known to science & the @uniprot reference database.

dpmd.ai/alphafold-blog 2/
We’re excited to see how this will enable and accelerate research for scientists around the world. We've already seen promising signals from early collaborators using #AlphaFold in their own work, including @DNDi, @CEI_UoP, @UCSF & @CUBoulder: dpmd.ai/alphafold-case… 3/
Read 5 tweets
6 Jul
Many models bake in domain knowledge to control how input data is processed. This means models must be redesigned to handle new types of data.

Introducing the Perceiver, an architecture that works on many kinds of data - in some cases all at once: dpmd.ai/perceiver (1/)
Like Transformers, Perceivers process inputs using attention. But unlike Transformers, they first map inputs to a small latent space where processing is cheap & doesn’t depend on the input size. This allows us to build deep networks even when using large inputs like images. (2/)
Perceivers can learn a different attention pattern for each type of data (shown for images and video), making it easy for them to adapt to new data and unexplored problems where researchers may not know what kinds of patterns they should be looking for. (3/)
Read 4 tweets
14 May
In a new paper, our team tackles a fundamental AI problem: how can we simultaneously parse the world into objects and properties, while simultaneously inducing the rules explaining how objects change over time: dpmd.ai/3fmrxsn (1/)
Work by @LittleBimble with @pfau, @pushmeet, Matko Bosnjak, Lars Buesing, Kevin Ellis, and Marek Sergot. (2/)
This system combines the Apperception Engine with a binary neural network to learn a provably 100% accurate model of non-trivial environments (e.g. Sokoban) from noisy raw pixel data. (3/)
Read 4 tweets
7 May
Multimodal transformers achieve impressive results on many tasks like Visual Question Answering and Image Retrieval, but what contributes most to their success? dpmd.ai/3h8u23Z (1/)
This work explores how different architecture variations, pretraining datasets, and losses impact multimodal transformers’ performance on image retrieval: dpmd.ai/3eENAtF

(By Lisa Anne Hendricks, John Mellor, Rosalia Schneider, @jalayrac & @aidanematzadeh) (2/)
Multimodal transformers outperform simpler dual encoder architectures when the amount of data is held constant. Interestingly, larger datasets don’t always improve performance. (3/)
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(