Sergey Levine Profile picture
Associate Professor at UC Berkeley

Dec 7, 2021, 8 tweets

Intrinsic motivation allows RL to find complex behaviors without hand-designed reward. What makes for a good objective? Information about the world can be translated into energy (or rather, work), so can an intrinsic objective accumulate information? That's the idea in IC2. A 🧵:

The "Maxwell's demon" thought exercise describes how information translates into energy. In one version, the "demon" opens a gate when a particle approaches from one side, but not the other, sorting them into one chamber (against the diffusion gradient). This lowers entropy.

This seems to violate the second law of thermodynamics. The explanation for why it does not is that information about the particles itself is exchangeable with potential energy (that's a gross oversimplifications, but this is just a tweet...).

The idea behind IC2 (intrinsic control via information capture) is to instantiate this "belief entropy minimization" intuition into a practical unsupervised RL algorithm! There are a few variants of this principle, but they all train a latent belief model & minimize its entropy.

Minimizing belief entropy forces the agent to do two things: (1) figure out where everything is (find & observe the "particles"); (2) put things into a more orderly configuration, so that the beliefs are *simpler* (lower entropy). The latter leads to emergent skills.

For example, in a simple gridworld domain with moving objects that stop when the agent "tags" them, IC2 causes the agent to track down every object and tag it to stop its motion -- thus the agent always knows where everything is!

In the vizDoom video game environment, IC2 will look around to find enemies, and then shoot them, so that unpredictable enemies aren't there anymore (OK, this one is a bit violent... and maybe cause for some concern, but we'll find a way to apply it to more peaceful ends).

IC2 will be presented at @NeurIPSConf by @nick_rhinehart tomorrow, Tue 12/7, at 4:30 pm PT in poster session 2, poster C0. You can check out the paper here: openreview.net/forum?id=MO76t…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling