Eliezer Yudkowsky ⏹️ Profile picture
Sep 4, 2020 8 tweets 3 min read Read on X
A very rare bit of research that is directly, straight-up relevant to real alignment problems! They trained a reward function on human preferences AND THEN measured how hard you could optimize against the trained function before the results got actually worse.
Tl;dr (he said with deliberate irony) you can ask for results as good as the best 99th percentile of rated stuff in the training data (a la Jessica Taylor's quantilization idea). Ask for things the trained reward function rates as "better" than that, and it starts to find...
..."loopholes" as seen from outside the system; places where the trained reward function poorly matches your real preferences, instead of places where your real preferences would rate high reward. ("Goodhart's Curse", the combination of Optimizer's Curse plus Goodhart's Law.)
That is: they had to impose a (new) quantitative form of "conservatism" in my terminology, producing only results similar (low KL divergence) to things already seen, in order to get human-valued output. They didn't directly optimize for the learned reward function!
Why this doesn't solve the whole problem: with powerful AGI, you're not limited by how far you can optimize a learned reward function before the learned reward function stops well-predicting human feedback; you're limited by how hard the AI can optimize before human raters break.
To be explicit about precedents: this is not "learning a conservative concept" as I proposed that, nor "expected utility quantilization" as Jessica proposed that. OpenAI did a new thing, which you could see as simultaneously "mildly optimizing" and "conservative".

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ESYudkowsky

Sep 23
A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences.

A simple rejoinder is that just because Bernald Arnault has $170 billion, does not mean that he'll give you $77.18.

(Megathread.)
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bernald Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.
Read 37 tweets
Sep 22
A common claim among e/accs is that, since Space is big, Earth will be left alone by superintelligences.

A simple rejoinder (a longer one follows) is that just because Bill Gates has $139 billion dollars, does not mean that he'll give you $6300.
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check!)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bill Gates to send you $6,300 of his $139 billion dollars of wealth.

In real life, Gates says no.
Read 31 tweets
Aug 30
7 signs your daughter may be an LLM:

1. Does she have trouble multiplying numbers beyond 2-3 digits if she's not allowed to write out the steps?
2. If you ask her a question whose answer she doesn't know, does she sometimes make something up?
3. Is she incapable of matching the heights of human intellect, not able yet to independently advance the frontiers of science and technology without outside assistance?
Read 7 tweets
Aug 24
I really really really cannot predict which of my thoughts will exert an unholy fascination over 0.1% of readers
this one isn't even my invention. it's a thing that somebody else mentioned to me as an ice cream alternative. but some combination of my repeating it because it struck me as a vivid example, plus my mentioning it in a context of stuff not done, causes multiple cases like this.
anyway this is what makes it so hard for me to not start cults. like, I can choose not to lead cults. that's easy. but not having one cult per three months just materalize in the wake of my existence is weirdly hard.
Read 5 tweets
Aug 3
Her: I'm interested in seeing you try out this game I've been playing. Not saying more, think it's best with no spoilers.
Me: (Plays game for a few minutes.)
Me: Huh. This starting day is the zeroth iteration of a time loop, isn't it?
Her: HOW CAN YOU TELL THAT QUICKLY??
Shortly after:
Me: Well, see this library I'm visiting, which currently doesn't have any interesting interaction options? I'm going to come back here later in the time loop and need to look something up.
Her: Aaaagh!
Me: Character X isn't actually the chosen of [god].
Her: How are you inferring that?
Me: Because the dialogue section which said X was chosen of [god] also mentioned that it was extremely rare for [god] to choose anyone.
Read 7 tweets
Apr 26
To see how much the typical non-economist nerd understands prices -- not a normal person, a typical smartish guy who writes about numbers -- we look at the rules in Pathfinder D&D:

Every wizard, of any level over 3rd, adds exactly 1000gp/day of value when crafting magic items.
Rules-as-written:

Half of every magic item's book price is materials.

Any wizard, regardless of level, can craft 1000gp/day of any magic item they can make. Or double speed by adding 5 to the difficulty check, and I assume they do -- pick easy items! d20pfsrd.com/magic-items/ma…
By comparison, the rules for buying spells from wizards, say that a spell costs its level times the level of the wizard who casts it.

If you run the numbers, a 3rd-level wizard could earn at most 200gp/day casting all their spells.

For a 17th-level wizard, eh, it's a lot.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(