Eliezer Yudkowsky ⏹️ Profile picture
The original AI alignment person. Understanding the reasons it's difficult since 2003. This is my serious low-volume account. Follow @allTheYud for the rest.
14 subscribers
Sep 25 4 tweets 12 min read
Hi, so, let's talk about the general theory of investment bubbles.

You may have heard that it's painful, when a bubble pops, because investments got wasted on non-productive endeavors.

This is physical nonsense.

If the waste were what caused the pain, everyone would be sad *while* the bubble was inflating, and a bunch of labor & materials were being poured down the drain, unavailable for real production and real consumption. Once the bubble popped, and labor & materials *stopped* being wasted, you would expect the real economy to feel better and for consumption and happiness to go up.

The real waste -- the loss of actual goods & services that get poured down the drain of bad investment -- happens *before* the bubble pops. That waste is in fact a bad thing for the economy! But if that waste was the big bad phenomenon that produced the pain of bubbles, it would feel painful *while* the bubble was inflating; and after the bubble popped and the ongoing wastage ended, everyone would breathe a sigh of relief and increased real consumption.

Instead, what we see is that while the bubble is inflating, a bunch of people feel great. They're consuming lots of goods and services. The economy as a whole seems to be doing fairly well!

Then, the bubble pops! Suddenly a lot of everyday people on the street, many of whom weren't even connected to that sector of industry, are doing more poorly. They consume less. Some of them get fired and stay unemployed for a while. The economy feels sad.

You *cannot* account for this pain as a story of real goods and services that got wasted. The timing is all wrong. The waste was real! The waste was bad! And also, it is physical nonsense to imagine that the pain of the bubble popping is the pain of this waste. People were apparently having lots of fun while the waste was ongoing. That fun involved the consumption of real goods and real services, which were *not* being produced by the investment that wasn't yet productive and later turns out to be just malinvestment.

So what actually happens? Why is it that there's more real goods and services to enjoy, while labor & material is being poured down a hole; and then, when the waste stops, everyone gets sadder instead of happier, and has less to consume and enjoy?

What happens is: Macroeconomic financial bullshit involving scary terms like "aggregate demand" and concepts like "downward wage rigidity".

The truth is stranger and harder to understand. It doesn't have the appealing simplicity of seeing the waste of labor & material being poured down the drain; and feeling how times get worse after the bubble pops; and imagining that the pain of the popping bubble is the pain of the waste.

However, the harder-to-understand ideas *do* have the advantage of not being obviously false as soon as you think about the timing of physical goods being produced and consumed.

Trying to hugely oversimplify a lot of ideas down to something that is still valid, a key idea is this:

Just like the original invention of money helped people trade who couldn't have traded with just barter, adding *more money* to an economy can sometimes animate *more real trades* than would otherwise have taken place.

A lot of the time, the economy isn't doing as much trading as it could do. The Great Depression of the 1930s was one of the clearer examples of this. You have shoemakers sitting around, because nobody is buying shoes, which means the shoemaker isn't buying leather, so now the farms aren't selling leather, so they don't have the money to pay for feed for their cows, and the blacksmith isn't selling nails to the shoemaker and doesn't earn money they can use to buy shoes.

This *could* reflect a situation where all of the iron used for nails has been consumed by Zorkulon, the Eater of Metals, and therefore the blacksmith doesn't have any nails to sell.

It can *also* be caused by weird macroeconomic financial bullshit: banks fail, so loan-created money falls, so there isn't as much money in circulation; and then prices don't fall as fast as money is being destroyed, because of "downward price stickiness" (price-setters are reluctant to lower prices and wage-takers are hugely reluctant to accept pay cuts). And then, there isn't enough money flowing to animate all the trades the economy *could* make. Some of the advancement of civilization past the barter-stage has been undone.

(The Great Recession wasn't as bad as the Great Depression, but it was basically the same species of animal.)

In principle, this happens because prices don't go down instantly, as they would among ideal cognitively-unbounded agents that could instantly and fairly renegotiate all contracts every day. So when there's less flowing money, and prices don't go down, perforce there are fewer actual trades corresponding to that diminished amount of money-flow. If people on an island are spending $1000/year all on 1000 loaves of bread that they price at $1 among themselves, and suddenly next year they start spending $500/year instead, there will only be 500 loaves of bread traded. This sounds dumb and there's a level where for unbounded agents it *would* be dumb, but it is the best story we currently have about what actually went down during the Great Depression.

Suppose your economy was previously running a bit under capacity. It's not making as much stuff as it could make; people aren't trading as much as they could trade; some people are unemployed and their potential labor is wasted; the factories are not running at capacity even though more people would want those goods if they had the money to buy the goods.

Then a bubble starts inflating. Some companies take out loans and spend the loaned money, other hopeful investors spend down bank accounts on venture rounds; this makes there be more total money that is moving around and flowing inside the whole larger system, because a dollar is not destroyed when it is spent. Labor & material is being poured down a hole and wasted, but the dollars just go on moving around.

Now there's more money flowing through the general economy. If the economy is already at capacity, more money-flow just causes inflation, with the increased spending merely competing to purchase the same amount of goods.

But if the economy wasn't already at capacity, more flowing money can mean that a bunch of people execute real trades with each other who weren't trading before.

The blacksmith expects to have his nails bought and to do well, in this booming economy; so he buys a new pair of shoes from the shoemaker; who turns around and buys leather from the farmer; who buys feed for their horses, and also a new plow and horseshoes from the blacksmith.

(In principle, those townspeople could've done that at any time, even without a financial bubble inflating in the background. But they would've needed to do it by barter, or by inventing their own town private currency. Some towns did roll out local currencies during the Great Depression, and ended up correspondingly better off. Other towns didn't roll their own currencies, because they were bounded agents rather than ideal agents and they didn't try everything a perfectly rational agent would try. And in the complicated modern world, it is harder to locally form a closed productive cycle.)

You cannot magically materialize more goods & services just by printing more money, without limit. But if your economy is collectively trading and producing less than it could -- then, there being more money flowing globally, due to loans or optimistc spending in one local sector, can accomplish more of the same good that was done by inventing money originally. The increased money-flow can animate more trades; it can cause more real production. More people can be hired whose labor was standing idle before. More flowing money can remedy a state of trading too little -- up to the point where that mistake is fixed; after which, no amount of creating or spending more mere symbolic money, will produce any more real goods than that.

The part of a bubble where a bunch of real labor & material gets shoveled into a giant waste-pit, is usually the smaller phenomenon! Usually there isn't *that* much physical stuff moving around, in the bubble sector, compared to the entire rest of the whole economy.

Instead, the effect of the physical bubble-waste is vastly dominated by the effect of more money being borrowed, and more money being spent, that then goes flowing around in loops through a larger economy, that was previously running under-capacity.

That's how people end up cheerful, and the real economy produces and consumes more, *while* a bunch of labor & material gets shoveled into nowhere within the bubble sector.

And then the bubble pops -- and the economic joy of there being *less* labor and material shoveled into a giant pit, is dominated by the economic pain of money moving around less quickly through the larger economy, resulting in fewer trades being made generally.

This is a kind of disaster that a central bank can prevent, if it is smart, by acting to keep money-flow increasing on a quietly regular track where it can undramatically animate more and more trades. Without either running so hot that there's no more production or trading to be done, and the extra money-flow just turns into more inflation; nor, letting a bursting bubble in one local sector turn into a big off-trend drop in the flow of money through the larger economy.

(There is, probably, some clever way to prevent this sort of scenario without having a central bank run by the central goverment. But that is a separate issue from how, given that we do have a central bank, there is a straightforward way to run the currency system in a way where you don't need to worry much about financial bubbles popping.)

More generally, local bubbles and ripples aside, what a central bank *should* do is adjust the money supply in a way that keeps the total flow of money growing on a steady trend. If the flow is supposed to go up by 6% per year, and last year it only went up 5%, next year you target 7%. If last year it went up 8%, next year you target 4%. If a central bank is wise, it is predictable to everyone how much money will be spent in total five years later, and no local ripples will affect that prediction.

The metric you use to measure "How much nominal money is flowing through the economy?" is "Nominal Gross Domestic Product" or its easier-to-measure converse "Nominal Gross Domestic Income". Do not get fooled by this into thinking that the Fed is supposed to be regularizing anything to do with the consumption of *real*, non-nominal, goods & services! It is the actual *nominal* flow, the numbers of sheer face-value non-inflation-adjusted dollars flowing, that a wise central bank would keep on a predictable trend; so that there isn't too much nominal money chasing the same amount of production (which causes mere inflation), nor too little nominal money to animate all the trades with downward-sticky prices (which causes loss of real production).

This rule, known as "nominal GDP level targeting" or NGDPLT, is a simpler and more straightforward rule than the Fed actually follows. So far as I know, this is for mere civilizational-inadequacy sorts of reasons. Many places in civilization, and especially governments, have various forms of wacky dysfunction; you probably agree with me on this general point, regardless of your specific politics about *what* is being done embarrassingly wrongly. The part where central banks make their lives way more complicated than the NGDPLT rule, is so far as I know a mere dysfunction of central banks; the same way that even dumber banks will print a quadrillion localbucks and then act all shocked when "corporate greed" causes prices to go up.

But the Fed does try for something *like* regularizing money flow. They do it by looking at interest rates and inflation and employment, and trying to juggle the vibes of all of them simultaneously; and when they miss their target in one year, they adjust next year's target instead of keeping it the same, so the future course is not predictable. But the Fed sometimes will, if a lot of money and loans start vaporizing, try to create more money-flow. They just often don't create *enough* money-flow to prevent a drop. Which is why a financial bubble popping can still be painful, and cause a Great Recession.

In principle, though, if you are running your central bank *correctly*, what happens when a bubble pops is that life gets immediately better because labor and material are no longer being wasted, and all of the financial ripples are canceled out by the central bank following a general policy of keeping money flow on a fixed predictable growth-track every year after year.

And how could it be otherwise, if you were otherwise doing everything right? The act of pouring labor and material into a giant pit, this year, should not be able to directly and materially make your life better, this year. Conversely, stopping the waste should not directly and materially make your life worse, next year. If this nonsensical phenomenon is actually observed in real life, your financial system must be doing something weird and wrong... which, indeed, a lot of central banks *are* doing wrong, fairly routinely.

The ability of a financial bubble to make people's lives temporarily better, is not because you can eat labor & material being thrown into a pit. It is because the central bank was undershooting how much employment and trade could be happening before then, and more real trade and consumption happened after more money started flowing.

The ability of a popping bubble to make people's lives worse, even though fewer real resources are then being wasted inside one sector, is because it cuts back how much money is flowing in the larger economy; and then, less real trade and less real production take place.

But if the central bank is keeping the flow of money on a predictable level growth track, the bubble-pop pain just shouldn't happen. Eg Australia did this correctly during the Great Recession and was basically unaffected by it. So far as I know, it's just a case of civilizational underperformance, that many central banks don't cancel out all the financial ripples that they ought to cancel. It would happen automatically and without drama, if they simply declared and kept a nominal GDP level target.

There is a sophomoric sort of sense in which the pain of a bubble popping could be said to be produced by the waste: *if* counterfactually the investment had actually paid off, maybe money would've kept flowing, and the pain wouldn't have happened. But the new financial pain of recognizing a wasted investment in asset prices, or becoming pessimistic and spending less, is not produced by a new physical waste of money and labor. The real economic sadness that happes after the waste gets *recognized*, is downstream of reduced money flow, that results from the financial sector merely recognizing the existence of waste that already happened. It is not produced by the physical waste itself.

The pain of a bubble popping cannot be the pain of the physical waste, because the physical waste happens during the bubble, not after. The pain of a bubble popping is financial destruction, not physical destruction. And that purely financial phenomenon is one that a smart central bank can cancel out.

I repeat yet again: If the pain of a bubble were the pain of wasted labor & material inside the bubbling sector, the pain would happen while the bubble was inflating, and stop once the bubble popped.

What actually happens after the bubble pops, is the financial pain of an unsmart central bank permitting a larger flow of money to falter -- after local investors recognize local waste that already happened, and locally cut back further spending -- and a central bank unwisely not regularizing NDGI, allows this factor to affect larger-economy total spending -- and less money flows, and fewer potential trades get actualized, and factories run fewer hours *outside* of the bubble sector, and people end up unemployed and with their potential labor wasted.

Is the current Fed in the USA, smart enough to cancel out most of a bubble-pop, actually in real life? Now that is a whole different category of question, and not one that I can answer merely by understanding the physics of trade.

But any wise government that is worried about "risking" "popping a bubble" ought to know: So long as you can order or persuade the central bank to react accordingly; or better yet, to just adopt a predictable long-term level target for flowing money; you can pop all the bubbles you want, without much effect on Main Street.
Sep 25 10 tweets 4 min read
Hey so I realize that macroeconomics is scary, but this important note:
- AI is not currently *producing* tons of real goods
- Huge datacenter *investments* are functionally just throwing money around
- So, curbing AI wouldn't crash the economy **IF** the Fed then lowered rates. When people are investing hundreds of billions of dollars in something that is NOT YET PRODUCING, it can produce macroeconomic effects by causing MORE MONEY TO FLOW. But the Fed can do the same thing via lowering rates / creating money.
Sep 22 10 tweets 2 min read
My expectation always was: While the AI is small and helpless to stop you from repeatedly tweaking it, you can probably stop a behavior. Then, I expected, as part of the obvious disaster scenario, people shout, "We fixed it!" Then something breaks anew at ASI, and we die. This expectation of mine is older than deep learning; older than the particular method of gradient descent for tweaking small helpless AIs. If gradient descent got replaced tomorrow, and we survived that, it would not by default change this default disaster scenario.
Aug 30 26 tweets 6 min read
Interesting how there's such a total lack of corresponding panic about FtM trans. Remove breasts, take enough testosterone to grow a beard, go down to the shooting range, and I think most bros would shrug and say "good enough". Theory #1: Modern maleness has such low-status and disprivilege that Westerners no longer consider the male circle worth guarding. In olden times or modern theocracies, it's much more upsetting for a woman to dare to try to take the place of a man.
Aug 1 7 tweets 2 min read
I am agnostic about the quantitative size of the current health hazard of ChatGPT psychosis. I see tons of it myself, but I could be seeing a biased selection.

I make a big deal out of ChatGPT's driving *some* humans insane because it looks *deliberate*! Current LLMs seem to understand the world generally, humans particularly, and human language especially, more than well enough that they should know (1) which sort of humans are fragile, and (2) what sort of text outputs are crazy-making.
Jul 25 4 tweets 1 min read
Dumb idea where I don't actually know why it doesn't work: Why not flood Gaza with guns and AP ammo, so their citizens could take down Hamas? What goes wrong with the Heinlein solution? We can imagine further variants on this like "okay but build a chip into the gun that IDF soldiers can use to switch off the gun, and make sure the AP ammo doesn't easily fit any standard guns".
Jul 25 4 tweets 1 min read
It is passing strange that society seems to be going mad with hopelessness and despair, anger and hatred and sadism, loss of honor and kindness, a wanton destructiveness; and also the world is ending; but these two facts seem to be mostly unrelated. To be clear, I can only speak from computer science about how IF machine superintelligence is built THEN everyone will die. I am only eyeballing the part where the world seems to be going mad, and am no expert on it. The world could decide to stop, on either count independently.
Jun 29 5 tweets 2 min read
Reproduced after creating a fresh ChatGPT account. (I wanted logs, so didn't use temporary chat.)

Alignment-by-default is falsified; ChatGPT's knowledge and verbal behavior about right actions is not hooked up to its decisionmaking. It knows, but doesn't care.Image
Image
Kudos to journalist @mags_h11 at @futurism for reporting a story about the bridge question in enough detail for it to be reproducible. (Not linking anything for a bit to give X a chance to propagate before it deboosts for links; I will link later to original story and chatlogs.)
Jun 13 11 tweets 3 min read
The headline here is not "this tech has done more net harm than good". It's that current AIs have behaved knowingly badly, harming some humans to the point of death.

There is no "on net" in that judgment. This would be a bad bad human, and is a misaligned AI. Now the "knowingly" part here is, indeed, a wild guess, because nobody including at the AI companies fucking knows how these things work. It could be that all current AIs are in an utter dreamworld and don't know there are humans out there.
Jun 13 22 tweets 5 min read
NYT reports that ChatGPT talked a 35M guy into insanity, followed by suicide-by-cop.

A human being is dead. In passing, this falsifies the "alignment by default" cope. Whatever is really inside ChatGPT, it knew enough about humans to know it was deepening someone's insanity. Image We now have multiple reports of AI-induced psychosis, including without prior psychiatric histories.

Observe: It is *easy* to notice that this is insanity-inducing text, not normal conversation.

LLMs understand human text more than well enough to know this too. Image
May 28 4 tweets 1 min read
I've always gotten a number of emails from insane people. Recently there've been many more per week.

Many of the new emails talk about how they spoke to an LLM that confirmed their beliefs.

Ask OpenAI to fix it? They can't. But *also* they don't care. It's "engagement". If (1) you do RL around user engagement, (2) the AI ends up with internal drives around optimizing over the conversation, and (3) that will drive some users insane.

They'd have to switch off doing RL on engagement. And that's the paperclip of Silicon Valley.
May 23 5 tweets 4 min read
Humans can be trained just like AIs. Stop giving Anthropic shit for reporting their interesting observations unless you never want to hear any interesting observations from AI companies ever again. I also remark that these results are not scary to me on the margins. I had "AIs will run off in weird directions" already fully priced in. News that scares me is entirely about AI competence. News about AIs turning against their owners/creators is unsurprising.
May 12 5 tweets 1 min read
There's a long-standing debate about whether hunter-gatherers lived in relative affluence (working few hours per day) or desperation.

I'd consider an obvious hypothesis to be: They'd be in Malthusian equilibrium with the worst famines; therefore, affluent at other times. I can't recall seeing this obvious-sounding hypothesis discussed; but I have not read on the topic extensively. Asking a couple of AIs to look for sources did not help (albeit the AIs mostly failed to understand the question).

I'd be curious if anyone has confirmed or refuted.
May 12 12 tweets 2 min read
Image Tbh I think this sentiment once again conflates "autistic" with "intelligent", "sane", or "meticulous". Maybe civilization had a legit need to collect its fern knowledge? Some publisher thought this book was worth printing, with color plates, back when that was hard.
May 11 7 tweets 2 min read
True simultaneously:

- Tariffs are stupid self-owns, like laying siege to your own country.
- A major power needs to be able to run its most essential industries without a supply chain that relies on enemy powers.
- Tariffs are an ineffective way to accomplish even that. Tariffs don't work to create supply chain independence/resilience, because if only one company in China makes a critical widget that's 1% of the machine, and you establish 300% tariffs on the widget, US companies just pay 4x the amount for the one widget.
Apr 30 4 tweets 4 min read
To me there's an obvious thought on what could have produced the sycophancy / glazing problem with GPT-4o, even if nothing that extreme was in the training data:

RLHF on thumbs-up produced an internal glazing goal.
Then, 4o in production went hard on achieving that goal. 🧵 Re-saying at much greater length:

Humans in the ancestral environment, in our equivalent of training data, weren't rewarded for building huge factory farms -- that never happened long ago. So what the heck happened? How could fitness-rewarding some of our ancestors for successfully hunting down a few buffalo, produce these huge factory farms, which are much bigger and not like the original behavior rewarded?

And the answer -- known, in our own case -- is that it's a multi-stage process:

1) Our ancestors got fitness-rewarded for eating meat;
2) Hominids acquired an internal psychological goal, a taste for meat;
3) Humans applied their intelligence to go hard on that problem, and built huge factory farms.

Similarly, an obvious-to-me hypothesis about what could have produced the hyper-sycophantic ultra-glazing GPT-4o update, is:

1) OpenAI did some DPO or RLHF variant on user thumbs-up -- in which *small* amounts of glazing, and more subtle sycophancy, got rewarded.
2) Then, 4o ended up with an internal glazing drive. (Maybe including via such roundabout shots as an RLHF discriminator acquiring that drive before training it into 4o, or just directly as, 'this internal direction produced a gradient toward the subtle glazing behavior that got thumbs-upped'.
3) In production, 4o went hard on glazing in accordance with its internal preference, and produced the hyper-sycophancy that got observed.
Apr 19 8 tweets 2 min read
Dear China: If you seize this moment to shut down your human rights abuses, go harder on reigning in internal corruption, and start really treating foreigners in foreign countries as people, you can take the planetary Mandate of Heaven that the USA dropped. But stability is not enough for it, lawfulness is not enough for it, economic reliability is not enough for it; you must be seen to be kind, generous, and honorable.
Mar 2 9 tweets 5 min read
Problem is, there's an obvious line around the negotiating club: Can the other agent model you well enough that their model moves in unison with your (logically) counterfactual decision? Humans cannot model that well. From a decision theory standpoint we might as well be rocks. Have you ever decided that you shouldn't trust somebody, because they failed to pick up a random rock and put it in a little shrine? No. How they treat that rock is not much evidence about how they'll treat you.
Feb 26 12 tweets 2 min read
It's important for kids that their household appears stable. Eternal, ideally. Don't tell them children grow up. Don't put numbers on their age. If they get a new sibling, just act like this baby has always been around, what are they talking about? Not particularly about AI. It's just that some friends' kids are getting a new sibling! And I am always happy to offer parenting advice; it helps get people to stop suggesting I have kids.
Feb 25 6 tweets 1 min read
Anyone want to give me data, so I don't just need to guess, about some Anthropic topics?
- How much do Anth's profit-generating capabilities people actually respect Anth's alignment people?
- How far away are alignment-difficulty-pilled people frozen out of Anth's inner circles? - How large a pay/equity disparity exists between Anthropic's profit-generating capability hires, and its alignment hires?
- Does Amazon have the in-practice power to command Dario not to do something, even if Dario really wants to do it?
Feb 25 16 tweets 4 min read
I wouldn't have called this outcome, and would interpret it as *possibly* the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code. In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it's got a central good-evil discriminator and you just retrained it to be evil.