Introducing the '21 DeepMind x @ai_ucl Reinforcement Learning Lecture Series, a comprehensive introduction to modern RL.

Follow along with our researchers are they explore Markov Decision Processes, sample-based learning algorithms & much more: dpmd.ai/2021RLseries 1/2 Image
Also find the full series via the DeepMind @YouTube channel: dpmd.ai/DeepMindxUCL21
In the first lecture of the series, Research Scientist Hado introduces the course and explores the fascinating connection between reinforcement learning and artificial intelligence: dpmd.ai/RLseries1

#DeepMindxUCL @ai_ucl Image
In lecture two, Research Scientist Hado explains why it's important for learning agents to balance exploring and exploiting acquired knowledge at the same time: dpmd.ai/RLseries2

#DeepMindxUCL @ai_ucl Image
In the third lecture, Research Scientist Diana shows us how to solve MDPs with dynamic programming to extract accurate predictions and good control policies: dpmd.ai/RLseries3

#DeepMindxUCL @ai_ucl Image
In lecture four, Diana covers dynamic programming algorithms as contraction mappings, looking at when and how they converge to the right solutions: dpmd.ai/RLseries4

#DeepMindxUCL @ai_ucl Image
In this lecture, Hado explores model-free prediction and its relation to Monte Carlo and temporal difference algorithms: dpmd.ai/RLseries5

#DeepMindxUCL @ai_ucl Image
In part two of the model-free lecture, Hado explains how to use prediction algorithms for policy improvement, leading to algorithms - like Q-learning - that can learn good behaviour policies from sampled experience: dpmd.ai/RLseries6

#DeepMindxUCL @ai_ucl Image
In this lecture, Hado explains how to combine deep learning with reinforcement learning for deep reinforcement learning. He looks at the properties and difficulties that arise when combining function approximation with RL algorithms: dpmd.ai/RLseries7

#DeepMindxUCL @ai_ucl Image
In this lecture, Research Engineer Matteo explains how to learn and use models, including algorithms like Dyna and Monte-Carlo tree search (MCTS): dpmd.ai/RLseries8

#DeepMindxUCL @ai_ucl Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with DeepMind

DeepMind Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DeepMind

29 Sep
From packing an umbrella to preparing for extreme conditions, predicting short term weather patterns is crucial for daily life.

New research with the @metoffice and SOTA model advances the science of Precipitation Nowcasting - the prediction of rain: dpmd.ai/nowcasting 1/4
Today’s weather systems provide planet-scale predictions several days ahead, but often struggle to generate high-resolution predictions for short lead times. Nowcasting fills this performance gap, with predictions on rainfall within the next 1-2 hours. 2/4
Compared to widely used-nowcasting methods, meteorologists from @metoffice rated this method as their 1st choice 89% of the time.

There's more to do but our researchers hope this will act as a base for future work & promote greater integration of ML & environmental science. 3/4
Read 4 tweets
27 Jul
Reinforcement learning typically trains & tests agents on the same game. New work shows how our team trains generally capable agents on huge game spaces, resulting in agents that generalise to held-out test games, & learn behaviours like experimentation dpmd.ai/open-ended-blog 1/
Rather than training on a limited number of tasks, our team defines a whole universe of tasks that can be procedurally generated, from simple object finding games to complex strategic games like Capture the Flag. 2/
By constructing a hierarchical learning process with an open-ended and iteratively refined objective, it was possible to train agents that never stop learning, and develop increasingly general behaviour across games. 3/
Read 5 tweets
26 Jul
Mixed Integer Programming is an NP-hard optimisation problem arising in planning, logistics, resource allocation, etc.

Presenting a solver with neural heuristics that learns to adapt to the problem domain, outperforming SCIP on Google-scale MIPs: dpmd.ai/13349 (1/) Image
Practical applications often focus on finding good solutions fast rather than proving optimality. In follow-up work, Neural Neighborhood Selection finds better solutions even faster by learning heuristics for large neighborhood search: dpmd.ai/10201 (2/) Image
The neural solver learns even on single problem instances, improving the best known solutions to three open MIPLIB problems.

Milo-v12-6-r1-75-1: dpmd.ai/milo-v12-6-r1-…
Neos-1420790: dpmd.ai/neos-1420790
xmas10-2: dpmd.ai/xmas10-2

(3/) Image
Read 4 tweets
23 Jul
Yesterday we announced early collaborations using the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome to date. So how is AlphaFold helping these organisations with their work…? 1/ Image
The Drugs for Neglected Diseases initiative (@DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world. 2/ Image
The @CEI_UoP is using #AlphaFold's predictions to help engineer faster enzymes for recycling some of our most polluting single-use plastics. 3/ Image
Read 4 tweets
22 Jul
Today with @emblebi, we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: dpmd.ai/alphafolddb 1/
We’re also sharing the proteomes of 20 other biologically-significant organisms, totalling over 350k structures. Soon we plan to expand to over 100 million, covering almost every sequenced protein known to science & the @uniprot reference database.

dpmd.ai/alphafold-blog 2/
We’re excited to see how this will enable and accelerate research for scientists around the world. We've already seen promising signals from early collaborators using #AlphaFold in their own work, including @DNDi, @CEI_UoP, @UCSF & @CUBoulder: dpmd.ai/alphafold-case… 3/
Read 5 tweets
6 Jul
Many models bake in domain knowledge to control how input data is processed. This means models must be redesigned to handle new types of data.

Introducing the Perceiver, an architecture that works on many kinds of data - in some cases all at once: dpmd.ai/perceiver (1/)
Like Transformers, Perceivers process inputs using attention. But unlike Transformers, they first map inputs to a small latent space where processing is cheap & doesn’t depend on the input size. This allows us to build deep networks even when using large inputs like images. (2/)
Perceivers can learn a different attention pattern for each type of data (shown for images and video), making it easy for them to adapt to new data and unexplored problems where researchers may not know what kinds of patterns they should be looking for. (3/)
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(