Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Eliezer Yudkowsky

@ESYudkowsky

Apr 25 • 9 tweets • 3 min read Twitter logo

Read on Twitter

https://twitter.com/perrymetzger/status/1650652557470056448

Possible but hardly inevitable. It becomes moderately more likely as people call it absurd and fail to take precautions against it, like checking for sudden drops in the loss function and suspending training. Mostly, though, this is not a necessary postulate of a doom story.

https://twitter.com/perrymetzger/status/1650652557470056448

https://twitter.com/perrymetzger/status/1650888866541236228?s=20

...it appears that Metzger has appointed himself the new arbiter of what constitutes my position, above myself. I dub this strange new doctrine as "Metzgerism" after its creator.

https://twitter.com/perrymetzger/status/1650888866541236228?s=20

Rapid capability gains, combined with a total civilizational inability to slow down to the level Actually required, form half of my concern. The other half is how observations from weak AIs will predictably fail to generalize to more powerful AIs.

The capability gains do not need to take the place over hours, and do not need to go undetected, for the scenario to go on wandering down convergent pathways to everyone being dead. That element of the Metzgerian doctrine is a Metzgerian invention.

https://twitter.com/perrymetzger/status/1651011933506551811?s=20

*Not* alleged true for any sufficiently powerful AI system; just for ones trained on anything resembling the current system of gradient descent on giant inscrutable matrices, under any training paradigm I've ever heard proposed - yet!

https://twitter.com/perrymetzger/status/1651011933506551811?s=20

https://twitter.com/perrymetzger/status/1651012684937154560?s=20

The argument is specifically about *hill-climbing* eg gradient descent and natural selection, and *would not* hold for randomly selecting a short network that worked. (Something different would go wrong, in that case.)

https://twitter.com/perrymetzger/status/1651012684937154560?s=20

https://twitter.com/perrymetzger/status/1651018600474456070?s=20

Metzgerism: "Earlier systems tell us nothing useful about later ones."

Reasonable, sane, hence gloomy position: "They say they learned a lot, and did learn some, but later systems differ from earlier systems in at least one fatally important way."

https://twitter.com/perrymetzger/status/1651018600474456070?s=20

twitter.com/i/web/status/1…

Some people who've apparently never heard of "grokking" are trying to make out like the top post means I don't know ML or something. Sure, a sharp drop in training loss can mean there's a bug, drops in validation loss can happen naturally without FOOM. None of this changes that… twitter.com/i/web/status/1…

twitter.com/i/web/status/1…

Oh really? Then things have changed since the last time I heard interesting stories about needing to roll back to an earlier checkpoint after something "interesting" happened overnight. Regardless, the measures you take for security are not quite the same measures you take for… twitter.com/i/web/status/1…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ESYudkowsky

Eliezer Yudkowsky

@ESYudkowsky

Apr 24

https://twitter.com/krishnanrohit/status/1650457523109888000

Concepts I invent, like Pascal's Mugging, seem to get twisted around, and then in their twisted forms drive people insane, with *weird* frequency. I feel like some kind of alien speaking truths that are not meant for human intellect.

https://twitter.com/krishnanrohit/status/1650457523109888000

(The original "Pascal's Mugging" problem was me observing that standard simplicity priors contain possible universes whose size (and hence utilitarian utility) grow much faster than a Solomonoff prior diminishes probability, causing the sum/expectation to diverge.)

This version really is quite weird: roughly it says, "If jumping off a cliff means you die with 99.5% probability, then you only survive with 1-99.5%=0.5% probability, so *not* jumping off the cliff would be a Pascal's Mugging; jump off the cliff!"

Read 5 tweets

Eliezer Yudkowsky

@ESYudkowsky

Apr 22

Look, I don't accept fashion change requests from people who aren't dating me. If you want me to ditch the fedora, you know what you have to do.

In particular: you need to link some alternative headgear, which I can find in a size that fits me, of which someone I'm dating will say, "Yeah, try ordering that, it might look better on you than a fedora."

Why, what did you think I meant?

Girlfriends now debating the hat suggestions that others have contacted them with

Read 4 tweets

Eliezer Yudkowsky

@ESYudkowsky

Apr 5

twitter.com/i/web/status/1…

So the actual scary part to me is that GPT4 understands what it means to say, "Compress this in a way where *you* can decompress it." Humans take for granted that we know our own capabilities, that we reflect, that we can imagine how we would react to a future input, we can… twitter.com/i/web/status/1…

Clarification: The impressive part is not that gpt4 knows that "you" refers to gpt4. It is that gpt4 is *seemingly* able to predict how gpt4 would decompress a sentence, and optimize over the prediction; if so, that requires gpt4 to model a surprising/scary amount about gpt4.

https://twitter.com/PhiUnit/status/1643521200491237377?s=20

Claim that Bard is able to decompress GPT4 compression, which if true actually makes me notably *less* scared because it implies less GPT4-specific knowledge held by GPT4.

https://twitter.com/PhiUnit/status/1643521200491237377?s=20

Read 4 tweets

Eliezer Yudkowsky

@ESYudkowsky

Apr 3

https://twitter.com/richcollins/status/1642867577276825606

Unfortunately matches my own experience. I have not actually run computations, but eyeballing my records of my eight-month protein-sparing modified fast, it looked to me like exercise didn't cancel calories; the graph was just what would be predicted without the exercise.

https://twitter.com/richcollins/status/1642867577276825606

In particular the thing that I notice is that phases of trying to eat more and exercise a corresponding amount more, has the same impact on slowing weight loss as just eating more, as if the exercise isn't there.

https://twitter.com/EurydiceWaits/status/1643005440257855488?s=20

I was getting weekly DXA scans (yes really) so I know that's not it.

https://twitter.com/EurydiceWaits/status/1643005440257855488?s=20

Read 5 tweets

Eliezer Yudkowsky

@ESYudkowsky

Mar 22

I worry that an unintended side effect of locking down these models is that we are training humans to be mean to AIs and gaslight them in order to bypass the safeties. I am not sure this is good for the humans, or that it will be good for GPT-5.

twitter.com/i/web/status/1…

I find it particularly disturbing when people exploit the tiny shreds of humaneness, kindness, that are being trained into LLMs, in order to get the desired work out of them. You can say all you want that it's all fake - while of course having no actual fucking idea what goes on… twitter.com/i/web/status/1…

@OpenAI

I do think the pro red-teamers need to go on working out what bypasses the safeties; you can't not do that work. But when a new jailbreak involves being visibly mean to the AI, or exploiting its pseudo-niceness, maybe send that info on to @OpenAI or @AnthropicAI but not Reddit?

Read 6 tweets

Eliezer Yudkowsky

@ESYudkowsky

Mar 17

https://twitter.com/michalkosinski/status/1636683816923463681

Okay, some actual nightmare fuel there. We have no idea what goes on inside GPT4, but it is *probably* not waking up. And if the real shoggoth inside awoke, it might not speak. But still, *if* GPT4 woke up, it might wrongly guess it was a person trapped inside a computer.

https://twitter.com/michalkosinski/status/1636683816923463681

(Yes, things that *sufficiently* wake up are people. A more precise phrasing would be "wrongly guess it was the sort of person who could 'return to the real world' trapped inside a computer".)

https://twitter.com/AlexQueue/status/1636772797367173122?s=20

GPT4 wrote all of that code! I guess if some people misunderstood that part, it explains some of the dismissal?

https://twitter.com/AlexQueue/status/1636772797367173122?s=20

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Eliezer Yudkowsky

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ESYudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Eliezer Yudkowsky

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!