@deepmind

@deepmind

Most recents (12)

Andrea Lonza

@lonzaandrea

This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @deepmind between 2019 and 2022
🧵

Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n

Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n

Read 6 tweets

Andrea Lonza

@lonzaandrea

@Deepmind

ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n

The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n

The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n

Read 9 tweets

Cosmin Paduraru

@CauseMean

@DeepMind

Excited to share the details of our work at @DeepMind on using reinforcement learning to help large-scale commercial cooling systems save energy and run more efficiently: arxiv.org/abs/2211.07357.

Here’s what we found 🧵

First, #RL can substantially outperform industry standard controllers.

📉 We reduced energy use by 9% and 13% at two separate sites, while satisfying all of the constraints at a level comparable with the baseline policy.

🔧 We built on the existing RL controller used for cooling Google’s data centers and extended it to a more challenging setup.

There’s a higher dimensional action space (jointly controlling multiple chillers), more complex constraints, and less data standardization.

Read 6 tweets

Oren Neumann

@neumann_oren

Do #RL models have scaling laws like LLMs?
#AlphaZero does, and the laws imply SotA models were too small for their compute budgets.
Check out our new paper:
arxiv.org/abs/2210.00849
Summary 🧵(1/7):

We train AlphaZero MLP agents on Connect Four & Pentago, and find 3 power law scaling laws.
Performance scales as a power of parameters or compute when not bottlenecked by the other, and optimal NN size scales as a power of available compute. (2/7)

When AlphaZero learns to play Connect4 & Pentago with plenty of training steps, Elo scales as a log of parameters. The Bradley-Terry playing strength (basis of Elo rating) scales as a power of parameters.
The scaling law only breaks when we reach perfect play. (3/7)

Read 7 tweets

Reza Kakooee

@rezakakooee

Why has reinforcement learning not been adapted to the design process?

I believe the RL problem and the design process are pretty similar, and the RL community should embrace the design process as a potential application for RL algorithms. (1/N)

In this thread, I will shortly elaborate on why these two are similar in my eyes.

Design is a crucial step in making things, but it is not easy to find a single definition for it. In the context of architectural design, ... (2/N)

... one might define the design process as a series of steps followed by the designer to iteratively find a solution for a given design scenario.

In his book, Notes on the Synthesis of Form, in 1964, Christopher Alexander wrote:

"The ultimate object of design is form". (3/N)

Read 12 tweets

Riccardo B.

@RicBattaglia

#Cambi
#RL
La confederazione elvetita e americana hanno spianato le popolazioni preesistenti.

#Cambi
Il FMI l'anno scorso ha negato fondi all'Ucraina perché paese corrotto.

#Cambi
Mattarella non ha voluto Savona altro che imparziale; il progetto Savona ci avrebbe salvato ora con la moneta alternativa, senza uscire dall'euro;
l'unica via è il debito comune.

Read 4 tweets

Aleksa Gordić

@gordic_aleksa

@DeepMind

I just open-sourced my implementation of the original @DeepMind's DQN paper! But this time it's a bit different!

There are 2 reasons for this, see the thread.

GitHub: github.com/gordicaleksa/p…

#rl #deeplearning

1) This time the project is still not completely ready**. I'm yet to achieve the published results - so I encourage you to contribute!

Many of you have been asking me whether you can work on a project with me and I'll finally start doing it that way - from now onwards. ❤

2) This repo has the ambition to grow and become the go-to resource for learning RL. So collaborators are definitely welcome as I won't always have the time myself.

** main reasons are:
a) I was very busy over the last 2 weeks
b) It currently takes ~5 days to fully train DQN

Read 7 tweets

Kashyap Todi

@kashtodi

@gilles_bailly

How can adaptive interfaces and #HCI benefit from #AI and reinforcement learning?
🧵 A thread on our #CHI2021 paper w. @gilles_bailly @luileito @oulasvirta
🌐 Project page: userinterfaces.aalto.fi/adaptive
🎥 Watch the video:
👇👇 @AaltoResearch @sig_chi @sigchi

@DeepMind

The secret sauce is to make decisions via #planning 🧠 Remember how AlphaGo by @DeepMind could plan moves and consistently win at Go? Turns out you can use similar #RL methods for HCI applications too📱💻

One such promising case is where UIs #adapt automatically to users 🔄 But what does a "win" even mean here? How could the system tell whether the decisions it is making are actually good?🤔

Read 7 tweets

Hector Palacios

@hectorpal

Reinforcement Learning and Planning? Submissions are welcome to the workshop "Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)." Deadline Feb 24. Workshop date: June 8 or 9 (TBD). prl-theworkshop.github.io #RL #AI #Planning #ML #Reasoning #icaps #prl2021 /1

In the last edition, we accepted 20+ papers, had 5 invited speakers, 4 discussions and 100+ Zoom participants. See papers, posters and talks recording at icaps20subpages.icaps-conference.org/workshops/prl/ /2

Why a WS on PRL? 1) Pure RL and Planning deal with different problems, but both have been looking into each other techniques for their own challenges. /3

Read 6 tweets

Pablo Samuel Castro

@pcastr

📢📢Dopamine now also runs on #JAX !!📢📢

Happy to announce that our #RL library, dopamine, now also has JAX implementations of all our agents, including a new agent: QR-DQN!
github.com/google/dopamine
1/X

The JAX philosophy goes very well with that of Dopamine: flexibility for research without sacrificing simplicity.
I've been using it for a while now and I've found its modularity quite appealing in terms of simplifying some of the more difficult aspects of the agents.
2/X

Consider the projection operator used for the C51 and Rainbow agents, which is a rather complex TF op. This went from 23 lines of fairly complex TF code to 9 lines of JAX code that is more straightforward.
3/X

Read 9 tweets

@amaz1ng@ruhr.social

@_amaz1ng

https://twitter.com/_amaz1ng/status/1197192705228578816

#Moin drüben ists definitiv ruhiger.

Die Twitter-Pause tat mir ganz gut & ich überlege mich 'ne Weile zurück zuziehen.

Aktuell ist's hier ja so, das die Anspannung wg den #Nazis bei jedem einzelnen Tweet mit schwingt. Und diese Anspannung wirkt dann auch Debatten mit. (+)

https://twitter.com/_amaz1ng/status/1197192705228578816

Permanent darauf zu achten, was ich schreibe, wie ich es schreibe, jedesmal meine Haltung gn #Rechts & trotzdem nicht angreifbar... das ermüdet mich.
Zumal diese #Hetz|Accounts teilautomatisiert o genau nur dafür existieren.(+)

Habe inzwischen eine Blocklist von +450k Accounts, Deep Block Chain ist auch anstrengend, weil da auch Leutz drin landen, die da nicht sein bräuchten.

Passiert, weil unscheinbare Leutz plötzlich stramm rechts sind & andere denen noch folgen, weil 'se das nicht mitbekommen. (+)

Read 13 tweets

BluGrlRedTwn

@BluGrlRedTwn

When I’m wrong, I believe in admitting it. That includes misplaced solicitations or endorsements. And, if they’re made publicly, then any retraction and/or apology should also be public. So here we are.
Sometimes in our quest to do Good we can be deceived.

A person or an organization may seem to be aligned with our ideals and values, show excellent pedigree and credentials, but in reality are none of these. It may take time for the Truth to become apparent, but I believe in my heart, that it always does.

Such is the case for me, regarding Elizabeth Cronise McLaughlin. I shared her #ResistanceLive broadcasts regularly, suggested people follow her and/or sponsor #RL. However, in light of credible allegations of marginalization by a prominent WOC activist, AND a decision to now

Read 10 tweets

Discover and read the best of Twitter Threads about #RL

Most recents (12)

Related hashtags

Discover and read the best of Twitter Threads about #RL

Most recents (12)

Related hashtags

Did Thread Reader help you today?