#NeurIPS2021 spotlight: Optimal policies tend to seek power.

Consider Pac-Man: Dying traps Pac-Man in one state forever, while staying alive lets him do more things. Our theorems show that for this reason, for most reward functions, it’s optimal for Pac-Man to stay alive. 🧵:
We show this formally through *environment symmetries*. In this MDP, the visualized state permutation ϕ shows an embedding of the “left” subgraph into the “right” subgraph. The upshot: Going “right” leads to more options, and more options -> more ways for “right” to be optimal.
We provide the first formal theory of the statistical incentives of optimal policies, which applies to all MDPs with environment symmetries. Besides showing that keeping options is more likely, we also show it is more powerful. Thus, “optimal policies tend to seek power.”
This lesson generalizes. It might be very, very hard to design intelligent real-world AI systems which let us deactivate and correct them. If, statistically, most goals don’t incentivize that behavior, then our goals would conflict with the goals of most smart AI agents.
Paper: arxiv.org/abs/1912.01683

NeurIPS recorded presentation: neurips.cc/virtual/2021/p…

NeurIPS poster session: Tomorrow, Tue 7 Dec 8:30 a.m. PST, spot D3 in eventhosts.gather.town/app/sX430NSSjB…

Series of blog posts on this line of work: alignmentforum.org/s/fSMbebQyR4wh…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Turner

Alex Turner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(