Eliezer Yudkowsky ⏹️ Profile picture
The original AI alignment person. Missing punctuation at the end of a sentence means it's humor. If you're not sure, it's also very likely humor.
14 subscribers
Jun 13 11 tweets 3 min read
The headline here is not "this tech has done more net harm than good". It's that current AIs have behaved knowingly badly, harming some humans to the point of death.

There is no "on net" in that judgment. This would be a bad bad human, and is a misaligned AI. Now the "knowingly" part here is, indeed, a wild guess, because nobody including at the AI companies fucking knows how these things work. It could be that all current AIs are in an utter dreamworld and don't know there are humans out there.
Jun 13 22 tweets 5 min read
NYT reports that ChatGPT talked a 35M guy into insanity, followed by suicide-by-cop.

A human being is dead. In passing, this falsifies the "alignment by default" cope. Whatever is really inside ChatGPT, it knew enough about humans to know it was deepening someone's insanity. Image We now have multiple reports of AI-induced psychosis, including without prior psychiatric histories.

Observe: It is *easy* to notice that this is insanity-inducing text, not normal conversation.

LLMs understand human text more than well enough to know this too. Image
May 28 4 tweets 1 min read
I've always gotten a number of emails from insane people. Recently there've been many more per week.

Many of the new emails talk about how they spoke to an LLM that confirmed their beliefs.

Ask OpenAI to fix it? They can't. But *also* they don't care. It's "engagement". If (1) you do RL around user engagement, (2) the AI ends up with internal drives around optimizing over the conversation, and (3) that will drive some users insane.

They'd have to switch off doing RL on engagement. And that's the paperclip of Silicon Valley.
May 23 5 tweets 4 min read
Humans can be trained just like AIs. Stop giving Anthropic shit for reporting their interesting observations unless you never want to hear any interesting observations from AI companies ever again. I also remark that these results are not scary to me on the margins. I had "AIs will run off in weird directions" already fully priced in. News that scares me is entirely about AI competence. News about AIs turning against their owners/creators is unsurprising.
May 12 5 tweets 1 min read
There's a long-standing debate about whether hunter-gatherers lived in relative affluence (working few hours per day) or desperation.

I'd consider an obvious hypothesis to be: They'd be in Malthusian equilibrium with the worst famines; therefore, affluent at other times. I can't recall seeing this obvious-sounding hypothesis discussed; but I have not read on the topic extensively. Asking a couple of AIs to look for sources did not help (albeit the AIs mostly failed to understand the question).

I'd be curious if anyone has confirmed or refuted.
May 12 12 tweets 2 min read
Image Tbh I think this sentiment once again conflates "autistic" with "intelligent", "sane", or "meticulous". Maybe civilization had a legit need to collect its fern knowledge? Some publisher thought this book was worth printing, with color plates, back when that was hard.
May 11 7 tweets 2 min read
True simultaneously:

- Tariffs are stupid self-owns, like laying siege to your own country.
- A major power needs to be able to run its most essential industries without a supply chain that relies on enemy powers.
- Tariffs are an ineffective way to accomplish even that. Tariffs don't work to create supply chain independence/resilience, because if only one company in China makes a critical widget that's 1% of the machine, and you establish 300% tariffs on the widget, US companies just pay 4x the amount for the one widget.
Apr 30 4 tweets 4 min read
To me there's an obvious thought on what could have produced the sycophancy / glazing problem with GPT-4o, even if nothing that extreme was in the training data:

RLHF on thumbs-up produced an internal glazing goal.
Then, 4o in production went hard on achieving that goal. 🧵 Re-saying at much greater length:

Humans in the ancestral environment, in our equivalent of training data, weren't rewarded for building huge factory farms -- that never happened long ago. So what the heck happened? How could fitness-rewarding some of our ancestors for successfully hunting down a few buffalo, produce these huge factory farms, which are much bigger and not like the original behavior rewarded?

And the answer -- known, in our own case -- is that it's a multi-stage process:

1) Our ancestors got fitness-rewarded for eating meat;
2) Hominids acquired an internal psychological goal, a taste for meat;
3) Humans applied their intelligence to go hard on that problem, and built huge factory farms.

Similarly, an obvious-to-me hypothesis about what could have produced the hyper-sycophantic ultra-glazing GPT-4o update, is:

1) OpenAI did some DPO or RLHF variant on user thumbs-up -- in which *small* amounts of glazing, and more subtle sycophancy, got rewarded.
2) Then, 4o ended up with an internal glazing drive. (Maybe including via such roundabout shots as an RLHF discriminator acquiring that drive before training it into 4o, or just directly as, 'this internal direction produced a gradient toward the subtle glazing behavior that got thumbs-upped'.
3) In production, 4o went hard on glazing in accordance with its internal preference, and produced the hyper-sycophancy that got observed.
Apr 19 8 tweets 2 min read
Dear China: If you seize this moment to shut down your human rights abuses, go harder on reigning in internal corruption, and start really treating foreigners in foreign countries as people, you can take the planetary Mandate of Heaven that the USA dropped. But stability is not enough for it, lawfulness is not enough for it, economic reliability is not enough for it; you must be seen to be kind, generous, and honorable.
Mar 2 9 tweets 5 min read
Problem is, there's an obvious line around the negotiating club: Can the other agent model you well enough that their model moves in unison with your (logically) counterfactual decision? Humans cannot model that well. From a decision theory standpoint we might as well be rocks. Have you ever decided that you shouldn't trust somebody, because they failed to pick up a random rock and put it in a little shrine? No. How they treat that rock is not much evidence about how they'll treat you.
Feb 26 12 tweets 2 min read
It's important for kids that their household appears stable. Eternal, ideally. Don't tell them children grow up. Don't put numbers on their age. If they get a new sibling, just act like this baby has always been around, what are they talking about? Not particularly about AI. It's just that some friends' kids are getting a new sibling! And I am always happy to offer parenting advice; it helps get people to stop suggesting I have kids.
Feb 25 6 tweets 1 min read
Anyone want to give me data, so I don't just need to guess, about some Anthropic topics?
- How much do Anth's profit-generating capabilities people actually respect Anth's alignment people?
- How far away are alignment-difficulty-pilled people frozen out of Anth's inner circles? - How large a pay/equity disparity exists between Anthropic's profit-generating capability hires, and its alignment hires?
- Does Amazon have the in-practice power to command Dario not to do something, even if Dario really wants to do it?
Feb 25 16 tweets 4 min read
I wouldn't have called this outcome, and would interpret it as *possibly* the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code. In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it's got a central good-evil discriminator and you just retrained it to be evil.
Feb 23 8 tweets 2 min read
I usually roll my eyes hard enough to barely not injure myself, when somebody talks about current legal systems and property rights having continuity with a post-strong-AGI future.

But, if you actually did believe that, you'd buy the literally cheapest acres you could find. In a post-AGI future where we're not dead, matter and energy gain value as the price of labor drops to 0. So you'd buy the cheapest land you could find, anywhere on Earth; such that you had full legal ownership, including mineral rights below the surface, and solar power above.
Jan 29 7 tweets 6 min read
So it's too late for this information to save your world, but let's talk about sex and gender in dath ilan.

As on Earth, the supervast majority of dath ilani are either male or female in terms of sexual biology. The vast majority of XY-chromosome bearers have dicks, XX-bearers have vaginae. Dath ilani just love being aware of edge cases, so they're not in denial about intersex cases, Y chromosomes that didn't manifest, and so on. But they're also aware of statistics, so they know the numbers and that those cases are rare.

Likewise being aware of statistics, dath ilani can grasp that some Gaussians overlap widely in their middles, while other curves (like the "penis or vagina?" curve) have hardly any middles at all. They are sufficiently grownup not to panic about the implications of two curves having different standard deviations.

For what does it matter (kids hear growing up) what other people have done along a curve? You are you, not them. Nobody in dath ilan is anyone except themselves, to be judged by the law for only their own actions and choices; or predicted by markets profit-motivated to take into account all visible individual differences when setting insurance premiums.

Fewer people in dath ilan than in Berkeley, and more than in Saudi Arabia, have received surgery or taken drugs that modifies their sexual characteristics away from their apparent birth sex. But the resulting kind of body is not exactly the same as the centroid for that birth sex; and it has occurred to no sane person in dath ilan to claim otherwise, for that would be false and known to be false, and dath ilan does not put up with that. Instead the results of different surgeries form a further addition to the special-edge-case list, which doesn't freak out dath ilani, because they are all computer programmers by nature, and virtuous computer programmers want to acknowledge edge cases.

"But what of gender?" you ask. "Do they think that an MtF is a woman?"

And the answer... is that dath ilani are sufficiently persnickety precisionist perfectionists that they wouldn't dream of just having "masculine" and "feminine" genders in the first place, what with there being more than one way to express either. What prediction market would be content with such paltry data?

But they don't have 73 genders either. A computer programmer immediately sees this is not a problem you solve with 73 subclasses.

No, in dath ilan they have gendertropes.

If you are old enough to remember Geek Codes on USENET then you already know how gendertropes work. It's what nerds do, given the time and half an opportunity, and dath ilan is from an Earthly perspective the Planet of the Nerds.

"Meddling Asexual" is a well-known gendertrope, shared between both asexes. "Desperate Demislut" (high sex drive, but can only feel sexual attraction to very few people, who then have a lot of negotiating power in that relationship) leans statistically more feminine than masculine, but nobody would blink at seeing it on a man's list of codes. "Person with a high sex drive who takes advantage of that to form an entire harem" is statistically divergent enough in its expression between the standard birth sexes that there's different gendertropes for the usually-male version and the usually-female version.

What in Earth might be called "agender" is "I hate your standard library and I'm just going to describe myself by hand". But people will roll their eyes at you if what you describe, or are seen to do, turns out to be pretty standard after all. They are rude, by Earth standards, in dath ilan; they tell fewer lies and conceal fewer reactions.

If you do statistical analysis on MtF-sexed persons (or FtMs, that cultural situation being a lot more symmetrical in dath ilan, with no weirdly one-sided moral panic) the survey finds that some MtF gendertrope distributions look statistically more like the masculine centroid, and some look more like the feminine centroid, and some are statistically associated more with MtFs than with either cis sex, and some tropes are common to both MtFs and FtMs. And that is considered fine, in dath ilan, because their first priority is the accurate description of facts. If you know somebody's bodily anatomy finely divided, and you know their gendertropes finely divided, you have the data the prediction market traders want to know, to bet on whether it will work out if you go on a date with someone.

And that, in dath ilan, is the point, and all that most anyone wants to do with all this genderstuff: make predictions about which relationships will work out. "Wants kids" / "doesn't want kids" is very near the top of standard-priority gendertrope code lists, for that reason, even if Earth would not think of that choice as a great essence of gender differentiation.

The dath ilani are not much for identifying themselves with the various clusters they could be put into -- though they do not deny the predictive power of those clusters, or their usefulness in shorthand and longhand communication. They just know that they are ultimately themselves, and that all of the other lives clustering around them are not their own lives to lead. The closest they come to identifying as male is saying, "Yeah, the bog-standard masculine centroid predicts me pretty well, and I'm comfortable with the suggested scripts for it."

And a lot of men do say that, in dath ilan. Because the standards committee responsible for surveying the masculine centroid, and annually updating the suggested default scripts, did a careful job of it...

...Before submitting their recommendations to the delegates of the fluid democracy annually temporarily formed by men, to negotiate with the delegates of women, on masculine-feminine relationship defaults for the next year. Those recommendations obviously are not binding on anyone (except a few highly specialized cities for people who really really want predictable standardized relationships, which make that a condition of accepting new residents) but it saves work over everyone negotiating separately. The negotiations don't touch on toilet seats because they designed better toilet seats. They don't touch on public restrooms because dath ilan doesn't have segregated restrooms, just restrooms with helpfully differentiated stations that only a few rare highly specialized cities would dream of making mandatory. "Fewer laws need fewer exceptions", as the saying goes in dath ilan.

Good luck, Earth. Dath ilan doesn't have to consider who gets to compete in Women's Sports because they just compete at things, and most women on the planet would be insulted at the suggestion. (They do have Anything-Goes Augmented Sports vs. Sports Without Drugs.)
Jan 28 5 tweets 2 min read
I heard from many people who said, "An NVDA drop makes no sense as a Deepseek reaction; buying NVDA." So those people have now been cheated by insider counterparties with political access. They may make fewer US trades in the future. Also note that the obvious meaning of this news is that someone told and convinced Trump that China will invade Taiwan before the end of his term, and the US needs to wean itself off Taiwanese dependence.
Jan 7 32 tweets 5 min read
Watching historians dissect _Chernobyl_.

Imagining Chernobyl run by some dude answerable to nobody, who took it over in a coup and converted it to a for-profit.

Shall we count up how hard it would be to raise Earth's AI operations to the safety standard AT CHERNOBYL? That charismatic guy who seized control of a nonprofit research power plant and converted it to a for-profit, answerable to nobody?

He's gone. His less coup-prone competitors, trying to build huger power cores, are also gone. That's not how they did things EVEN AT CHERNOBYL.
Dec 22, 2024 4 tweets 1 min read
Okay. Look. Imagine how you'd have felt if an AI had just proved the Riemann Hypothesis.

Now you will predictably, at some point, get that news LATER, if we're not all dead before then. So you can go ahead and feel that way NOW, instead of acting surprised LATER. So if you ask me how I'm reacting to a carelessly-aligned commercial AI demonstrating a large leap on some math benchmarks, my answer is that you saw my reactions in 1996, 2001, 2003, and 2015, as different parts of that future news became obvious to me or rose in probability.
Sep 23, 2024 37 tweets 18 min read
A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences.

A simple rejoinder is that just because Bernald Arnault has $170 billion, does not mean that he'll give you $77.18.

(Megathread.) Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check.)
Sep 22, 2024 31 tweets 12 min read
A common claim among e/accs is that, since Space is big, Earth will be left alone by superintelligences.

A simple rejoinder (a longer one follows) is that just because Bill Gates has $139 billion dollars, does not mean that he'll give you $6300. Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.

(Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ -9 OOMs. Check!)
Aug 30, 2024 7 tweets 1 min read
7 signs your daughter may be an LLM:

1. Does she have trouble multiplying numbers beyond 2-3 digits if she's not allowed to write out the steps? 2. If you ask her a question whose answer she doesn't know, does she sometimes make something up?