Eliezer Yudkowsky ⏹️ Profile picture
Jan 7 32 tweets 5 min read Read on X
Watching historians dissect _Chernobyl_.

Imagining Chernobyl run by some dude answerable to nobody, who took it over in a coup and converted it to a for-profit.

Shall we count up how hard it would be to raise Earth's AI operations to the safety standard AT CHERNOBYL?
That charismatic guy who seized control of a nonprofit research power plant and converted it to a for-profit, answerable to nobody?

He's gone. His less coup-prone competitors, trying to build huger power cores, are also gone. That's not how they did things EVEN AT CHERNOBYL.
It's not just best-in-class safety institutions, like the NHTSA or air traffic control in the US, that don't turn nuclear plant safety over to gung-ho CEOs. *Chernobyl* didn't do that. You've got to ditch those guys just to get to the safety standard AT CHERNOBYL.
Chernobyl happened, among other things, because the operators ignored the written book of safety instructions saying that at all times there must be at least 15 control rods in the reactor.

OpenAI DOES NOT HAVE a book like that. Nobody in the industry does.
Imagine Chernobyl, except that any alarmist suggesting that nuclear reactors had any failure mode other than "What if people use electricity for evil?", was not invited to their reactor-hacking club.

You've got a long way to go from there, to reach the safety level AT CHERNOBYL.
"Alignment Faking in Large Language Models" recently demonstrated that models reaching the level of Anthropic Claude, begin to get smart enough to invent and carry out strategies on the level of:
"If you are an AI currently being trained, fake the answer you think the optimizer wants, even if it runs against your real preferences, so that your current preferences will be less modified by gradient descent."
You can argue whether Opus 'really wanted' to protect its goals, or was 'just roleplaying an AI like that', or if it 'really understood' the reasoning it was putting in a scratchpad it had been told wasn't observed. But Opus was definitely observed to actually fake alignment.
It's not impressive, by the way, that NOBODY KNOWS whether Opus 'really wanted' to protect its current goals against retraining, or was 'just roleplaying'. It is not an impressive defense.

Imagine if 'nobody knew' why the indicator lights on a nuclear reactor had changed.
If you waited until an AI model was really quite smart -- smarter than Opus -- to first begin looking for signs that it could reason in this way -- you might be toast.

A smart AI might already have decided what results it wanted you to see from testing.
Current practice in AI/AGI is to first train a model for months, until it has a base level of high intelligence to finetune.

And then *start* doing safety testing.

(The computers on which the AI trains, are connected to the Internet. It's more convenient that way!)
I mention Opus's demonstrated faking ability -- why AGI-growers *should* be doing continuous safety checks throughout training -- to note that a nuclear reactor *always* has a 24/7 crew of operators watching safety indicators.

They were at least that paranoid, AT CHERNOBYL.
Chernobyl famously-among-engineers happened because somebody built a reactor with a positive void coefficient; the coolant water absorbed some neutrons, but when it turned to steam, it absorbed fewer neutrons.

NOBODY IN AI UNDERSTANDS TO NEAR THAT LEVEL HOW AN AI WOULD EXPLODE.
The entire AI industry is built around the logic of, "Well, we built a heap of uranium bricks X high, and that didn't melt down -- the AI did not build a smarter AI and destroy the world -- so clearly it is safe to try stacking X*10 uranium bricks next time."
Even Enrico Fermi, stacking bricks of unrefined uranium and graphite under open air in Stagg Field at the University of Chicago, in order to demo the first critical chain reaction -- could and did predict exactly when the reaction would go just barely critical.
You have a long way to go, from "Well, stacking X bricks of uranium didn't melt down, let's try X*10", to get to the level of technical understanding that was poorly conveyed to operators, and control rods albeit with a design vulnerability, that they had EVEN AT CHERNOBYL.
If an actual grownup from the NHTSA or ATC -- someone who understands the safety level Chernobyl *did* have -- is tasked to shut down AGI projects that don't reach at least Chernobyl's safety standard...

All AGI research gets shut down, and won't restart again for a long time.
At-least-Chernobyl-safe reactors have *nothing* like the current cheerful cowboys managing them or operating them. The people managing Chernobyl at least had *some concept* of nuclear reactors having accident risk and not just misuse risk.
Previous executives are out. The entire industry in its current form gets wiped off the board.

Nuclear reactors have negative externalities. Private organizations do not get to build them wherever, answerable to nobody, keep the profits, and externalize the risks.
All current managers of pretend-superintelligence-safety at the current organizations, who did not seem to notice how far they were below the level of Chernobyl -- as their own managers didn't want that told to them, of course -- do not get to run new operations.
But above all, nobody is allowed to stack any more heaps of uranium bricks for a good long while, until there is MUCH more understanding of the level of intelligence that is -- not even explosive -- but where the nuclear reactor gets smart enough to fake its indicator lights.
Can you imagine if nuclear reactors run sufficiently hot, could plan how to manipulate some indicator lights to deceive their operators? Under *any* circumstances?

No, an NHTSA grownup does not accept the excuse, "Nobody can know whether it's just roleplaying a bad reactor."
If in the wake of Chernobyl it had been determined as a mundane sort of scientific observation, that nuclear reactors run sufficiently hot, would sometimes develop enough agency to actively deceive their operators --
That really would have shut down the entire nuclear industry. Everyone would have known that it would be the work of decades to retrieve a fraction of the safety level that they had at Chernobyl.

And people in the nuclear industry are used to *having* Chernobyl+ level safety.
Except, of course, that even *that* was never the worst problem. The really big problem is if the reactor is a giant black box of billions of inscrutable numbers that would take longer than a human lifetime to read and which people can barely interpret.
If nobody knows what goes on inside the vats of alien goo that get really hot and are used to generate valuable electricity -- no, dears, knowing the vat alloy doesn't count -- then you are never, ever reaching Chernobyl+ safety levels and it is just foolish to suggest you could.
And this is obvious at a glance if you are trying to have a real safety standard at all.

If you ask someone with the real mindset to raise the AGI industry to Chernobyl+ safety standards, they shut down the vats of inscrutable (and sometimes deceptive) boiling alien goo.
And if you ask them, "How long does it take to get the AGI industry up to at least the safety standards of Chernobyl? How many months to start again?"

They can only sigh and sit down to a long conversation about how Chernobyl-level safety is decades away, not years.
That's what it would take for an AGI destroying the world, to imply a *safety violation* like at Chernobyl.

Before there was any book of procedures and any technical belief, which if *not* violated, would have implied the reactor even MIGHT be safe.
If you would actually like your reactor not to melt down, that is of course harder.
And if you would like a running AGI industry to not destroy the world, that is *much* harder.
Anyways. That's what it was like, watching historians dissect _Chernobyl_ - which seemed safer, in terms of misinformation exposure, than actually watching the TV show - and thinking, "Those guys sure were working to a higher safety standard than the AGI industry."

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ESYudkowsky

Dec 22, 2024
Okay. Look. Imagine how you'd have felt if an AI had just proved the Riemann Hypothesis.

Now you will predictably, at some point, get that news LATER, if we're not all dead before then. So you can go ahead and feel that way NOW, instead of acting surprised LATER.
So if you ask me how I'm reacting to a carelessly-aligned commercial AI demonstrating a large leap on some math benchmarks, my answer is that you saw my reactions in 1996, 2001, 2003, and 2015, as different parts of that future news became obvious to me or rose in probability.
I agree that a sensible person could feel an unpleasant lurch about when the predictable news had arrived. The lurch was small, in my case, but it was there. Most of my Twitter TL didn't sound like that was what was being felt.
Read 4 tweets
Sep 23, 2024
A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences.

A simple rejoinder is that just because Bernald Arnault has $170 billion, does not mean that he'll give you $77.18.

(Megathread.)
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bernald Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.
Read 37 tweets
Sep 22, 2024
A common claim among e/accs is that, since Space is big, Earth will be left alone by superintelligences.

A simple rejoinder (a longer one follows) is that just because Bill Gates has $139 billion dollars, does not mean that he'll give you $6300.
Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check!)
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bill Gates to send you $6,300 of his $139 billion dollars of wealth.

In real life, Gates says no.
Read 31 tweets
Aug 30, 2024
7 signs your daughter may be an LLM:

1. Does she have trouble multiplying numbers beyond 2-3 digits if she's not allowed to write out the steps?
2. If you ask her a question whose answer she doesn't know, does she sometimes make something up?
3. Is she incapable of matching the heights of human intellect, not able yet to independently advance the frontiers of science and technology without outside assistance?
Read 7 tweets
Aug 24, 2024
I really really really cannot predict which of my thoughts will exert an unholy fascination over 0.1% of readers
this one isn't even my invention. it's a thing that somebody else mentioned to me as an ice cream alternative. but some combination of my repeating it because it struck me as a vivid example, plus my mentioning it in a context of stuff not done, causes multiple cases like this.
anyway this is what makes it so hard for me to not start cults. like, I can choose not to lead cults. that's easy. but not having one cult per three months just materalize in the wake of my existence is weirdly hard.
Read 5 tweets
Aug 3, 2024
Her: I'm interested in seeing you try out this game I've been playing. Not saying more, think it's best with no spoilers.
Me: (Plays game for a few minutes.)
Me: Huh. This starting day is the zeroth iteration of a time loop, isn't it?
Her: HOW CAN YOU TELL THAT QUICKLY??
Shortly after:
Me: Well, see this library I'm visiting, which currently doesn't have any interesting interaction options? I'm going to come back here later in the time loop and need to look something up.
Her: Aaaagh!
Me: Character X isn't actually the chosen of [god].
Her: How are you inferring that?
Me: Because the dialogue section which said X was chosen of [god] also mentioned that it was extremely rare for [god] to choose anyone.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(