Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Richard Ngo

@RichardMCNgo

Jul 9 • 11 tweets • 5 min read • Read on X

Ideologies very often end up producing the opposite of what they claim to want. Environmentalism, liberalism, communism, transhumanism, AI safety…

I call this the activist‘s curse. Understanding why it happens is one of the central problems of our time.

Twelve hypotheses:

1. Adverse selection on who participates. The loudest alarm is probably false, and the loudest activist is probably crazy.
2. Entrenchment. Accelerating in one direction creates pushback in the opposite direction, which eventually overpowers you.
lesswrong.com/posts/B2CfMNfa…

3. Perverse internal incentives. When your whole identity is wrapped up in a problem, it getting solved is terrifying.
4. Cowardice. If you actually try, failure is your responsibility. But failing without trying lets you still feel good about yourself: astralcodexten.com/p/book-review-…

5. Territoriality and infighting while trying to keep the movement pure. “What makes an outgroup? Proximity plus small differences”:
6. Respectability politics. You become your own side’s harshest critic to curry favor with outsiders.slatestarcodex.com/2014/09/30/i-c…

7. Perverse external incentives. You’ll be rewarded for doing what’s worst for you, because that’s what’s most interesting/funny/outrageous:
8. The dark forest hypothesis. When you’re too loud, the real powers come out and eat you.slatestarcodex.com/2014/12/17/the…

9. Trauma from fighting the world. “If you gaze long into an abyss” you become a cynic.
10. Ossification. Organizations eventually become so burdened by layers of rules, cruft, and “organizational scar tissue” that they’re net negative for their own goals: overcomingbias.com/p/what-makes-s…

https://twitter.com/visakanv/status/1471180144186331136

11. Goodhart’s law. When you optimize a proxy hard enough, it eventually diverges strongly from what you really care about.
12. “Don’t crash into the tree”: you get more of what you pay attention to. Fear-driven motivation doesn’t work.

https://twitter.com/visakanv/status/1471180144186331136

Thread inspired by a conversation with @HiFromMichaelV and @jessi_cata. They called it pessimization. Some of Michael’s thoughts:

In general, “perverse dynamics” which lead people towards exactly what they least wanted seem very fundamental and powerful.podcast.clearerthinking.org/episode/028/mi…

https://twitter.com/richardmcngo/status/1783899583715938760

@HiFromMichaelV @jessi_cata More on environmentalism in particular:

Also apparently #12 is called “target fixation”: . H/t @lasernite. en.m.wikipedia.org/wiki/Target_fi…

https://twitter.com/richardmcngo/status/1783899583715938760

https://twitter.com/repligate/status/1810523319617146903

@HiFromMichaelV @jessi_cata @lasernite Many great additional suggestions in the comments; I’ll link some here.

This one is kinda related to “the loudest activist is probably crazy”: regular common sense tends to be what solves problems.

https://twitter.com/repligate/status/1810523319617146903

https://twitter.com/atroyn/status/1810504364886552760

@HiFromMichaelV @jessi_cata @lasernite This one is related to the dark forest hypothesis: successful movements make themselves targets for sociopaths.

Strongly recommend @Meaningness’s essay on this:

And also the idea of “bootleggers and baptists” coalitions: meaningness.com/geeks-mops-soc…
en.m.wikipedia.org/wiki/Bootlegge…

https://twitter.com/atroyn/status/1810504364886552760

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @RichardMCNgo

Richard Ngo

@RichardMCNgo

Jun 10

https://twitter.com/RichardMCNgo/status/1798156694012490096

Eleven opinions on AI risk that cut across standard worldview lines:
1. The biggest risks are subversion of key institutions and infrastructure (see QT) and development of extremely destructive weapons.
2. If we avoid those, I expect AI to be extremely beneficial for the world.

https://twitter.com/RichardMCNgo/status/1798156694012490096

3. I am skeptical of other threat models, especially ones which rely on second-order/ecosystem effects. Those are very hard to predict.
4. There’s been too much focus on autonomous replication and adaptation; power-seeking “outside the system” is hard. See lesswrong.com/posts/xiRfJApX…

5. “Alignment” is a property of models not a property of research. I support any research that tries to understand neural networks on a deep scientific level.
6. Open-source is super useful for this, and will continue to be net-positive for years more (perhaps indefinitely).

Read 6 tweets

Richard Ngo

@RichardMCNgo

Jun 5

https://twitter.com/leopoldasch/status/1798016486700884233

My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here’s one scenario for how building misaligned AGI could lead to humanity losing control. THREAD:

https://twitter.com/leopoldasch/status/1798016486700884233

Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an artificial general intelligence (AGI) that’s a significant step up - it beats the best humans on almost all computer-based tasks.

Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be "jailbroken" to expose very different hidden personas.

Read 19 tweets

Richard Ngo

@RichardMCNgo

Apr 26

https://twitter.com/disclosetv/status/1783448895428645332

Environmentalism and its consequences have been a disaster for the human race. 1/N

https://twitter.com/disclosetv/status/1783448895428645332

https://twitter.com/mark_lynas/status/1783525603423007096

Environmentalism and its consequences have been a disaster for the human race. 2/N

https://twitter.com/mark_lynas/status/1783525603423007096

https://twitter.com/alecstapp/status/1783272857688256693

Environmentalism and its consequences have been a disaster for the human race. 3/N

https://twitter.com/alecstapp/status/1783272857688256693

Read 7 tweets

Richard Ngo

@RichardMCNgo

Feb 20

So apparently UK courts can decide that two unrelated jobs are “of equal value”.

And people in the “underpaid” job get to sue for years of lost wages.

And this has driven their 2nd biggest city bankrupt.

Am I getting something wrong or is this as crazy as it sounds?

There are a bunch of equal pay cases, but the biggest is against Birmingham City Council, which paid over a billion pounds in compensation because some jobs (like garbage collectors) got bonuses and others (like cleaners) didn’t. Now the city is bankrupt.

theguardian.com/society/2023/s…

Here’s another that seems ridiculous: *warehouse operatives* and *sales consultants*, at a private company, in a unanimous decision!

Why not go all the way to central planning? Honestly at this point it feels like the UK is *trying* to immiserate itself.

hrmagazine.co.uk/content/news/n…

Read 7 tweets

Richard Ngo

@RichardMCNgo

Dec 3, 2023

In my mind the core premise of AI alignment is that AIs will develop internally-represented values which guide their behavior over long timeframes.

If you believe that, then trying to understand and influence those values is crucial.

If not, the whole field seems strange.

Lately I’ve tried to distinguish “AI alignment” from “AI control”. The core premise of AI control is that AIs will have the opportunity to accumulate real-world power (e.g. resources, control over cyber systems, political influence), and that we need techniques to prevent that.

Those techniques include better monitoring, security, red-teaming, stenography detection, and so on. They overlap with alignment, but are separable from it. You could have alignment without control, or control without alignment, or neither, or (hopefully) both.

Read 4 tweets

Richard Ngo

@RichardMCNgo

Dec 2, 2023

Taking artificial superintelligence seriously on a visceral level puts you a few years ahead of the curve in understanding how AI will play out.

The problem is that knowing what’s coming, and knowing how to influence it, are two very very different things.

https://twitter.com/owainevans_uk/status/1698683186090537015

Here’s one example of being ahead of the curve: “situational awareness”. When the term was coined a few years ago it seemed sci-fi to most. Today it’s under empirical investigation. And once there’s a “ChatGPT moment” for AI agents, it will start seeming obvious + prosaic.

https://twitter.com/owainevans_uk/status/1698683186090537015

But even if we can predict that future agents will be situationally aware, what should we do about that? We can’t study it easily yet. We don’t know how to measure it or put safeguards in place. And once it becomes “obvious”, people will forget why it originally seemed worrying.

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Richard Ngo

Try unrolling a thread yourself!

More from @RichardMCNgo

Richard Ngo

Richard Ngo

Richard Ngo

Richard Ngo

Richard Ngo

Richard Ngo

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!