Richard Ngo Profile picture
studying AI and trust. ex @openai/@googledeepmind, now thinking in public.
6 subscribers
Jun 21 4 tweets 1 min read
Anthropic selection is usually invoked to explain one-off coincidences (e.g. fine-tuning).

But evolution shows that many rounds of selection can design complex traits.

So it’s plausible that complex high-level features of our species or civilization were “designed” by anthropics. The most obvious mechanism by which anthropic selection could iterate many times is via quantum effects - there are a huge number of branching points throughout the history of the universe, only some of which lead to sophisticated civilizations.
Jun 18 7 tweets 2 min read
Hypothesis: we’ll look back on mass migration as being worse for Europe than WW2 was.

Europe recovered quickly from WW2, because each country remained high-trust and homogeneous.

But you can’t just rebuild your way out of internal ethno-religious fractures. Why compare mass migration with WW2 specifically? Because the “never again, at any cost” attitude towards WW2 from European elites has been a major cultural force pushing against national identity and for suicidal immigration policies.

For more on that see:
Jun 3 4 tweets 1 min read
“Costly signaling” is one of the most important concepts but has one of the worst names.

The best signals are expensive for others - but conditional on that, the cheaper they are for you the better!

We should rename them “costly-to-fake signals”. Consider an antelope stotting while being chased by lions. This is extremely costly for unhealthy antelopes, because it makes them much more likely to be eaten. But the fastest antelopes might be so confident the lion will never catch them that it’s approximately free for them.
May 5 4 tweets 1 min read
I became a virtue ethicist after observing the failures of consequentialism and deontology in the real world.

But I’ve seldom read academic philosophers analyzing such examples when arguing about which ethical theory to endorse.

What are the best examples of that? Philosophers like Singer have made arguments that, *given* a certain ethical view, real-world evidence should motivate certain actions.

But that’s different from saying that the real-world evidence should motivate the ethical view in the first place.
Apr 20 4 tweets 1 min read
Modernity is a war of high and low against middle not just for classes, but also for levels of societal structure.

The power of middle-sized groups (like families, communities and states) is flowing both down to individuals, and up to international organizations and ideologies. Power flowing out from the middle is generally negative-sum though, because high and low are too different to collaborate productively.

So you get big governments and global ideologies ruling over increasingly dysfunctional and atomized societies.
Apr 17 4 tweets 2 min read
The AI safety community is very good at identifying levers of power over AI - e.g. evals for the most concerning capabilities.

Unfortunately this consistently leads people to grab those levers “as soon as possible”.

Usually it’s not literally the same people, but here it is. To be clear, I don’t think it’s a viable strategy to stay fully hands-off the coming AI revolution, any more than it would have been for the Industrial Revolution.

But it’s particularly jarring to see the *evals* people leverage their work on public goods to go accelerationist.
Mar 26 5 tweets 2 min read
We're heading towards a world where, in terms of skills and power, AIs are as far above humans as humans are above animals.

Obviously this has gone very badly for animals. So in a recent talk I ask: what political philosophy could help such a future go well? The history of politics is a tug-of-war between the rule of "innately superior" aristocrats and blank-slate egalitarianism.

But these are both essentialist philosophies which deny empirical truths.

Instead, the duty of skilled/powerful elites should be to empower everyone else.
Jan 25 4 tweets 3 min read
This essay is much more misleading than insightful, for (at least) two reasons:

1. The concept of AGI fully substituting for human labor is an incoherent one because humans have inherent advantages at some jobs simply because they're human. This can arise via consumer preferences (e.g. therapists), political considerations (e.g. lobbyists) or regulations (e.g. judges). As AI automates everything else, Baumol's effect predicts that such jobs become a large proportion of the economy.

It's fine to set up a naive econ model which ignores these, but it's irresponsible to give many pages of arguments about the implications of that naive model for the economy while relegating these crucial factors I mentioned above to one sentence in the conclusion. The way the essay is framed makes it hard for people who don't already know why it's wrong to realize how fragile the arguments are.

2. The essay claims that "as we continue innovating, we will eventually enter [a] second regime... in which we approach the physical limits of technological progress". This is true. It is also *ridiculously* distant. Forget Dyson spheres, this regime is one where we've figured out how to move stars around and colonize whole new galaxies and break at least a few things we currently consider fundamental laws of science.

Trying to appeal to this regime to draw *any* conclusions about human wages is absurd. None of the main concepts in this essay are robust enough that we can meaningfully extrapolate them that far. The essay is talking about "human wages" in a setting so futuristic that what even counts as "human" will likely be unrecognizable (due to genetic engineering/uploading/merging with AIs/etc).

The overall lesson: when you're reasoning about world-historic changes, you can't just take standard concepts that we use today, do some basic modeling, and run with that. All the hard work is in figuring out how our current concepts break when extrapolated into this new regime, and what to replace them with. I'm criticising this more directly than I usually would because I recently called out someone else's similarly-ungrounded forecasts about the economy, and as part of that thread made these very points to Matthew.

Linking the final tweet in that thread:
Jan 9 4 tweets 1 min read
Men and women are so psychologically different it's kinda weird that they WEREN'T very ideologically polarized until recently.

What changed? One hypothesis: dating forces you to compromise. But maybe political identities are now so strong that people prioritize them over dating. Another way of putting it: the stronger the forces behind memetic evolution become, the more political factions will end up polarizing on the most robust and deep-rooted axis of psychological variation. And that's gender.
Dec 26, 2024 9 tweets 3 min read
America winning used to be very correlated with Americans winning. But the more the US economy is driven by outlier talent and AI, the more they could decouple.

To be trusted, the immigrant-heavy tech right needs a clear vision for how economic growth benefits typical Americans. If you think that's obvious, consider: once AIs automate every job, the US economy could boom whether or not humans retain political power over those AIs.

And so, analogously, many Americans will worry that Silicon Valley could boom whether or not median Americans will benefit.
Dec 11, 2024 8 tweets 2 min read
Hypothesis: the key constraint on the productivity of most conferences is that prominent people will only attend if they get to feel high-status, and they'll only feel high-status if they get to give a talk, even though talks are much worse than memos for conveying complex ideas. I talked to one prolific conference organizer who told me that she knows her events are too talk-heavy, but that's the only way to lure in the people she most wants to attend. Standing on a stage feels powerful! (I was there to give a talk too, so I'm part of the problem.)
Dec 2, 2024 4 tweets 2 min read
Here's a recent lightning talk in which I argue that the distinction between misalignment and misuse risks from AI is often unhelpful. Instead, we should primarily think about "misaligned coalitions" of both humans and AIs, ranging from terrorist groups to authoritarian states. There are still some places where the misalignment/misuse distinction matters, but IMO "misaligned coalitions" is a better default ontology.

Related thoughts from @bshlgrs, @EvanHub and @jacobhhilton:
- lesswrong.com/posts/efwcZ35L…
- lesswrong.com/posts/efwcZ35L…
- lesswrong.com/posts/xXXXkGGK…
Nov 25, 2024 5 tweets 2 min read
A friend recently told me that not only do his internal subagents bargain with each other over how to allocate his time, they also charge each other interest when they overrun their allocated slots.

His internal family system has become a whole internal economy. His meta-level self, acting as the Federal Reserve, picked a prevailing interest rate of 5% per day. Pretty high! But at least I’ll be able to track his progress towards non-coercive motivation by watching his internal interest rate fall.
Nov 19, 2024 7 tweets 5 min read
My three most actionable suggestions for how to become much more emotionally integrated:
1. Talk to Claude about your feelings.
2. Try circling at least twice.
3. Identify the physical locations in your body where each of your emotions manifest.

Details in thread below. By “emotionally integrated” I roughly mean “your internal subagents have cooperative rather than adversarial relationships”. More on my intellectual framework for that:

The rest of this thread elaborates on the three pieces of advice above in particular:lesswrong.com/s/qXZLFGqpD7ae…
Nov 16, 2024 11 tweets 4 min read
The most valuable experience in the world is briefly glimpsing the real levers that move the world when they occasionally poke through the veneer of social reality. After I posted this meme someone asked me how to get better at thinking outside the current paradigm. I think a crucial part is being able to get into a mindset where almost everything is kayfabe, and the parts that aren’t work via very different mechanisms than they appear to.
Nov 2, 2024 12 tweets 3 min read
Scott’s case doesn’t persuade me. He’s missing that:
1. The biggest story of the election is the massive realignment of US political coalitions.
2. Experience shows Trump’s claims should be taken seriously but not literally.
3. Most elite institutions remain leftist monocultures. One way of telling the story of the last few decades is that the Democrats had the support of intellectual elites, Republicans didn’t, and that’s why Republicans lost so much ground.

But with Silicon Valley’s pivot right Republicans now have an intellectual elite faction onside.
Aug 13, 2024 7 tweets 2 min read
I worry that, just as AI safety has unintentionally provided cover for censorship of AI by companies, work on aligning AIs to “collective values” or “democratic preferences” will provide cover for censorship of AIs by governments. AI could easily lead to strong centralization of power, because a single model can be copied many times, and flexibly controlled.

IMO the worst misuse AND misalignment risks come from extreme centralization.

So it’s worth building protective norms now.
Jul 18, 2024 5 tweets 2 min read
I increasingly believe that there are fundamental principles which simultaneously govern the designs of well-functioning minds, organizations and societies.

Once we pin them down with mathematical precision, we’ll understand the world more deeply than we can currently imagine. E.g. there are striking similarities between
- a person being non-coercive towards themself, and a state being non-coercive towards its citizens
- a self-deceptive person, and a preference-falsifying society
- a person with coherent goals, and an organization with a clear mission
Jul 17, 2024 12 tweets 3 min read
Some thoughts on open-source AI:
1. We should have a strong prior favoring open source. It’s been a huge success driving tech progress over many decades. We forget how counterintuitive it was originally, and shouldn’t take it for granted. 2. Open source has also been very valuable for alignment. It’s key to progress on interpretability, as outlined here: beren.io/2023-11-05-Ope…
Jul 17, 2024 14 tweets 4 min read
I’m trying to understand what a Trump administration would look like, and it’s been useful to interpret Vivek Ramaswamy and JD Vance as conducting an extended debate with each other about that via their speeches.

Some examples: Vivek’s common refrain, as encapsulated in his RNC speech: America was founded on ideals, and we need to get back to them.
Jul 9, 2024 11 tweets 5 min read
Ideologies very often end up producing the opposite of what they claim to want. Environmentalism, liberalism, communism, transhumanism, AI safety…

I call this the activist‘s curse. Understanding why it happens is one of the central problems of our time.

Twelve hypotheses: 1. Adverse selection on who participates. The loudest alarm is probably false, and the loudest activist is probably crazy.
2. Entrenchment. Accelerating in one direction creates pushback in the opposite direction, which eventually overpowers you.
lesswrong.com/posts/B2CfMNfa…