In the first lecture of the series, Research Scientist Hado introduces the course and explores the fascinating connection between reinforcement learning and artificial intelligence: dpmd.ai/RLseries1
In lecture two, Research Scientist Hado explains why it's important for learning agents to balance exploring and exploiting acquired knowledge at the same time: dpmd.ai/RLseries2
In the third lecture, Research Scientist Diana shows us how to solve MDPs with dynamic programming to extract accurate predictions and good control policies: dpmd.ai/RLseries3
In lecture four, Diana covers dynamic programming algorithms as contraction mappings, looking at when and how they converge to the right solutions: dpmd.ai/RLseries4
In part two of the model-free lecture, Hado explains how to use prediction algorithms for policy improvement, leading to algorithms - like Q-learning - that can learn good behaviour policies from sampled experience: dpmd.ai/RLseries6
In this lecture, Hado explains how to combine deep learning with reinforcement learning for deep reinforcement learning. He looks at the properties and difficulties that arise when combining function approximation with RL algorithms: dpmd.ai/RLseries7
In this lecture, Research Engineer Matteo explains how to learn and use models, including algorithms like Dyna and Monte-Carlo tree search (MCTS): dpmd.ai/RLseries8
From packing an umbrella to preparing for extreme conditions, predicting short term weather patterns is crucial for daily life.
New research with the @metoffice and SOTA model advances the science of Precipitation Nowcasting - the prediction of rain: dpmd.ai/nowcasting 1/4
Today’s weather systems provide planet-scale predictions several days ahead, but often struggle to generate high-resolution predictions for short lead times. Nowcasting fills this performance gap, with predictions on rainfall within the next 1-2 hours. 2/4
Compared to widely used-nowcasting methods, meteorologists from @metoffice rated this method as their 1st choice 89% of the time.
There's more to do but our researchers hope this will act as a base for future work & promote greater integration of ML & environmental science. 3/4
Reinforcement learning typically trains & tests agents on the same game. New work shows how our team trains generally capable agents on huge game spaces, resulting in agents that generalise to held-out test games, & learn behaviours like experimentation dpmd.ai/open-ended-blog 1/
Rather than training on a limited number of tasks, our team defines a whole universe of tasks that can be procedurally generated, from simple object finding games to complex strategic games like Capture the Flag. 2/
By constructing a hierarchical learning process with an open-ended and iteratively refined objective, it was possible to train agents that never stop learning, and develop increasingly general behaviour across games. 3/
Mixed Integer Programming is an NP-hard optimisation problem arising in planning, logistics, resource allocation, etc.
Presenting a solver with neural heuristics that learns to adapt to the problem domain, outperforming SCIP on Google-scale MIPs: dpmd.ai/13349 (1/)
Practical applications often focus on finding good solutions fast rather than proving optimality. In follow-up work, Neural Neighborhood Selection finds better solutions even faster by learning heuristics for large neighborhood search: dpmd.ai/10201 (2/)
The neural solver learns even on single problem instances, improving the best known solutions to three open MIPLIB problems.
Yesterday we announced early collaborations using the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome to date. So how is AlphaFold helping these organisations with their work…? 1/
The Drugs for Neglected Diseases initiative (@DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world. 2/
The @CEI_UoP is using #AlphaFold's predictions to help engineer faster enzymes for recycling some of our most polluting single-use plastics. 3/
Today with @emblebi, we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: dpmd.ai/alphafolddb 1/
We’re also sharing the proteomes of 20 other biologically-significant organisms, totalling over 350k structures. Soon we plan to expand to over 100 million, covering almost every sequenced protein known to science & the @uniprot reference database.
We’re excited to see how this will enable and accelerate research for scientists around the world. We've already seen promising signals from early collaborators using #AlphaFold in their own work, including @DNDi, @CEI_UoP, @UCSF & @CUBoulder: dpmd.ai/alphafold-case… 3/
Many models bake in domain knowledge to control how input data is processed. This means models must be redesigned to handle new types of data.
Introducing the Perceiver, an architecture that works on many kinds of data - in some cases all at once: dpmd.ai/perceiver (1/)
Like Transformers, Perceivers process inputs using attention. But unlike Transformers, they first map inputs to a small latent space where processing is cheap & doesn’t depend on the input size. This allows us to build deep networks even when using large inputs like images. (2/)
Perceivers can learn a different attention pattern for each type of data (shown for images and video), making it easy for them to adapt to new data and unexplored problems where researchers may not know what kinds of patterns they should be looking for. (3/)