Research scientist on the scalable alignment team at Google DeepMind. All views are my own.
Dec 6, 2021 • 5 tweets • 3 min read
#NeurIPS2021 spotlight: Optimal policies tend to seek power.
Consider Pac-Man: Dying traps Pac-Man in one state forever, while staying alive lets him do more things. Our theorems show that for this reason, for most reward functions, it’s optimal for Pac-Man to stay alive. 🧵:
We show this formally through *environment symmetries*. In this MDP, the visualized state permutation ϕ shows an embedding of the “left” subgraph into the “right” subgraph. The upshot: Going “right” leads to more options, and more options -> more ways for “right” to be optimal.