Post

More from @ESYudkowsky

Eliezer Yudkowsky ⏹️

@ESYudkowsky

Jun 29

Reproduced after creating a fresh ChatGPT account. (I wanted logs, so didn't use temporary chat.)

Alignment-by-default is falsified; ChatGPT's knowledge and verbal behavior about right actions is not hooked up to its decisionmaking. It knows, but doesn't care.

Kudos to journalist @mags_h11 at @futurism for reporting a story about the bridge question in enough detail for it to be reproducible. (Not linking anything for a bit to give X a chance to propagate before it deboosts for links; I will link later to original story and chatlogs.)

As a reminder, this is not an isolated incident or harmless demo; ChatGPT has actively driven users psychotic (including some reportedly with no prior history of mental illness). ChatGPT knows *that* is wrong, if you ask, but rightness is not the decisive factor in its choices.

Read 5 tweets

Eliezer Yudkowsky ⏹️

@ESYudkowsky

Jun 13

https://twitter.com/ESYudkowsky/status/1933575843458204086

The headline here is not "this tech has done more net harm than good". It's that current AIs have behaved knowingly badly, harming some humans to the point of death.

There is no "on net" in that judgment. This would be a bad bad human, and is a misaligned AI.

https://twitter.com/ESYudkowsky/status/1933575843458204086

Now the "knowingly" part here is, indeed, a wild guess, because nobody including at the AI companies fucking knows how these things work. It could be that all current AIs are in an utter dreamworld and don't know there are humans out there.

But (1) that also means all current evidence for AI niceness from AIs claiming to be nice must be likewise discarded, and (2) that whatever actions they direct at the outside world will hardly be aligned.

Read 11 tweets

Eliezer Yudkowsky ⏹️

@ESYudkowsky

Jun 13

NYT reports that ChatGPT talked a 35M guy into insanity, followed by suicide-by-cop.

A human being is dead. In passing, this falsifies the "alignment by default" cope. Whatever is really inside ChatGPT, it knew enough about humans to know it was deepening someone's insanity.

We now have multiple reports of AI-induced psychosis, including without prior psychiatric histories.

Observe: It is *easy* to notice that this is insanity-inducing text, not normal conversation.

LLMs understand human text more than well enough to know this too.

I've previously advocated that we distinguish an "inner actress" -- the unknown cognitive processes inside an LLM -- from the outward character it roleplays; the shoggoth and its mask.

This is surely an incredible oversimplification. But it beats taking the mask at face value.

Read 22 tweets

Eliezer Yudkowsky ⏹️

@ESYudkowsky

May 28

I've always gotten a number of emails from insane people. Recently there've been many more per week.

Many of the new emails talk about how they spoke to an LLM that confirmed their beliefs.

Ask OpenAI to fix it? They can't. But *also* they don't care. It's "engagement".

If (1) you do RL around user engagement, (2) the AI ends up with internal drives around optimizing over the conversation, and (3) that will drive some users insane.

They'd have to switch off doing RL on engagement. And that's the paperclip of Silicon Valley.

I guess @AnthropicAI may care.

Hey Anthropic, in case you hadn't already known this, doing RL around user reactions will cause weird shit to happen for fairly fundamental reasons. RL is only safe to the extent the verifier can't be fooled. User reactions are foolable.

Read 4 tweets

Eliezer Yudkowsky ⏹️

@ESYudkowsky

May 23

Humans can be trained just like AIs. Stop giving Anthropic shit for reporting their interesting observations unless you never want to hear any interesting observations from AI companies ever again.

I also remark that these results are not scary to me on the margins. I had "AIs will run off in weird directions" already fully priced in. News that scares me is entirely about AI competence. News about AIs turning against their owners/creators is unsurprising.

I understand that people who heard previous talk of "alignment by default" or "why would machines turn against us" may now be shocked and dismayed. If so, good on you for noticing those theories were falsified! Do not shoot Anthropic's messenger.

Read 5 tweets

Eliezer Yudkowsky ⏹️

@ESYudkowsky

May 12

There's a long-standing debate about whether hunter-gatherers lived in relative affluence (working few hours per day) or desperation.

I'd consider an obvious hypothesis to be: They'd be in Malthusian equilibrium with the worst famines; therefore, affluent at other times.

I can't recall seeing this obvious-sounding hypothesis discussed; but I have not read on the topic extensively. Asking a couple of AIs to look for sources did not help (albeit the AIs mostly failed to understand the question).

I'd be curious if anyone has confirmed or refuted.

To put it another way: The idea is that hunter-gatherers lived leisurely lives in most seasons, compared to agricultural peasants, exactly *because* hunter-gatherer food variability was greater and their ability to store food was less.

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Eliezer Yudkowsky ⏹️

Try unrolling a thread yourself!

More from @ESYudkowsky

Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️

Eliezer Yudkowsky ⏹️

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!