In the first lecture of the series, Research Scientist Hado introduces the course and explores the fascinating connection between reinforcement learning and artificial intelligence: dpmd.ai/RLseries1
In lecture two, Research Scientist Hado explains why it's important for learning agents to balance exploring and exploiting acquired knowledge at the same time: dpmd.ai/RLseries2
In the third lecture, Research Scientist Diana shows us how to solve MDPs with dynamic programming to extract accurate predictions and good control policies: dpmd.ai/RLseries3
In lecture four, Diana covers dynamic programming algorithms as contraction mappings, looking at when and how they converge to the right solutions: dpmd.ai/RLseries4
In part two of the model-free lecture, Hado explains how to use prediction algorithms for policy improvement, leading to algorithms - like Q-learning - that can learn good behaviour policies from sampled experience: dpmd.ai/RLseries6
In this lecture, Hado explains how to combine deep learning with reinforcement learning for deep reinforcement learning. He looks at the properties and difficulties that arise when combining function approximation with RL algorithms: dpmd.ai/RLseries7
In this lecture, Research Engineer Matteo explains how to learn and use models, including algorithms like Dyna and Monte-Carlo tree search (MCTS): dpmd.ai/RLseries8
Introducing AlphaQubit: our AI-based system that can more accurately identify errors inside quantum computers. 🖥️⚡
This research is a joint venture with @GoogleQuantumAI, published today in @Nature → goo.gle/3ZflWMn
The possibilities in quantum computing are compelling. ♾️
They can solve certain problems in a few hours, which would take a classical computer billions of years. This can help lead to advances in areas like drug discovery to material design.
But building a stable quantum system is a challenge.
Qubits are units of information that underpin quantum computing. These can be disrupted by microscopic defects in hardware, heat, vibration, and more.
Quantum error correction solves this by grouping multiple noisy qubits together to create redundancy, into something called a “logical qubit”. Using consistency checks, a decoder then protects the information stored in this.
In our experiments, our decoder AlphaQubit made the fewest errors.
Our latest generative technology is now powering MusicFX DJ in @LabsDotGoogle - and we’ve also updated Music AI Sandbox, a suite of experimental music tools which can streamline creation. 🎵
This will make it easier than ever to make music in real-time with AI. ✨goo.gle/4eTg28Z
MusicFX DJ lets you input multiple prompts and include details on instruments, genres and vibes to create music. 🎛️
We’ve updated and improved the interface using feedback from @YouTube’s Music AI Incubator.
Two key innovations lie at the core of MusicFX DJ.
🔘 We adapted our models to perform real-time streaming by training them to generate the next 2 seconds of music, based on the previous 10 seconds.
🔘 “Style embedding” is steered by the player, which is a mix of text prompt embeddings set by the slider values
Meet our AI-powered robot that’s ready to play table tennis. 🤖🏓
It’s the first agent to achieve amateur human level performance in this sport. Here’s how it works. 🧵
Robotic table tennis has served as a benchmark for this type of research since the 1980s.
The robot has to be good at low level skills, such as returning the ball, as well as high level skills, like strategizing and long-term planning to achieve a goal.
To train the robot, we gathered a dataset of initial table tennis ball states - which included information about position, speed, and spin.
The system practiced using this library and learned different skills, like forehand topspin, backhand targeting, and returning serves.
AI systems can be powerful but opaque "black boxes" - even to researchers who train them. ⬛
Enter Gemma Scope: a set of open tools made up of sparse autoencoders to help decode the inner workings of Gemma 2 models, and better address safety issues. → dpmd.ai/gemma-scope
Language models turn your text input into a series of ‘activations’ - which map the relationships between the words you’ve entered to help it write its answer. 💬
Activations at different layers in its neural network represent increasingly advanced concepts, known as ‘features’.
Activations are made up of neurons, which “fire” for many unrelated features - making them hard to decipher.
Each feature seems to be a specific combination of neurons - but how can we find the meaningful combinations of neurons?
We’re also introducing ShieldGemma: a series of state-of-the-art safety classifiers designed to filter harmful content. 🛡️
These target hate speech, harassment, sexually explicit material and more, both in the input and output stages.
Finally, we’re announcing Gemma Scope, a set of tools to help researchers examine how Gemma 2 makes decisions. 🔍
It's a comprehensive, open suite of sparse autoencoders - specialized neural networks that zoom into the model’s inner workings and make them more interpretable.