Peter Hrosso Profile picture
Aug 11, 2022 17 tweets 2 min read Read on X
I've been looking into the AI alignment problem last couple of days and came up with the following summary of what problems there are and why. Also, I'd prefer using the umbrella name of Human alignment problem, as AI alignment is just a subset of it.
The problem is that we don't know what we want.
And even if we individually knew, we couldn't agree with others. (opinion aggregation)
Even if we agreed with others what we want, it would be hard to implement it. (coordination)
Maybe we can create something smarter than us that solves these problems.
But we don't know how to create something smarter than us.
Maybe we can create something that will start out dumber, but can learn and will eventually become smarter.
We are afraid that something like this could become very powerful very quickly, and it’s likely to kill us - either as a mere side-effect or because of conflicting goals. (AI alignment problem)
But we don't know how to describe what it should learn. (outer alignment)
So maybe we can just give examples of what we know we want it to learn. (ML training)
But it's impossible to describe all the cases, so in practice the situations facing the ML model will be quite different. (distributional shift)
And if what we want the ML model to learn is very specific and complicated, it's quite likely that what the model learns will behave very differently outside of our examples than how we'd want it to. (inner alignment)
It will also be hard to distinguish the cases where it does and where it doesn't do what we want. (eliciting latent knowledge)
Generally, sufficiently capable ML models are hard to understand. (interpretability)
Especially if the model knows it can do more of the stuff from training when we are not looking. (deceptive mesa optimizers)
Also, if we realize it’s doing something else than what we wanted it to, it might be hard to change it, because you’d be interfering with its learned goals. (corrigibility)
This is just a summary of my current understanding of the problem landscape. I don't subscribe to the stated motivations and conclusions, but more about that some other time.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Peter Hrosso

Peter Hrosso Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hrosspet

Aug 22, 2022
Spoiler alert! When I first read Eliot's Four Quartets a couple years ago it felt like pure wisdom. A perfect description of human condition. But it was very abstract. The poetic allusions were pointing in the right directions, without ever saying it aloud.
For some reason, today I started asking GPT-3 DaVinci for explanations of the last couple of verses. Almost all (like 23 out of 25) were really good. But then I asked: "Explain why the condition of complete simplicity costs not less than everything:
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
Through the unknown, remembered gate
When the last of earth left to discover
Is that which was the beginning;
Read 8 tweets
Aug 17, 2022
Self-deception has always been an issue for me and I think for many others as well. My current understanding of how it arises is that society creates external incentives for a certain set of beliefs, which are accepted and publicly endorsed, but in practice they are not followed.
Or, it can be the self-image created by the person, but the reinforcement from society is still important, because it makes it much more difficult to adapt it. Either because it's unrealistically hard to act on such beliefs, or because it's simply enough to act "as if".
What helped me notice my self-deceptions and (at least partially) get rid of them was to spend some time removed from the society (during covid).
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(