Richard Ngo Profile picture
What would we need to understand in order to design an amazing future? Figuring that out @openai
6 subscribers
Aug 13 7 tweets 2 min read
I worry that, just as AI safety has unintentionally provided cover for censorship of AI by companies, work on aligning AIs to “collective values” or “democratic preferences” will provide cover for censorship of AIs by governments. AI could easily lead to strong centralization of power, because a single model can be copied many times, and flexibly controlled.

IMO the worst misuse AND misalignment risks come from extreme centralization.

So it’s worth building protective norms now.
Jul 18 5 tweets 2 min read
I increasingly believe that there are fundamental principles which simultaneously govern the designs of well-functioning minds, organizations and societies.

Once we pin them down with mathematical precision, we’ll understand the world more deeply than we can currently imagine. E.g. there are striking similarities between
- a person being non-coercive towards themself, and a state being non-coercive towards its citizens
- a self-deceptive person, and a preference-falsifying society
- a person with coherent goals, and an organization with a clear mission
Jul 17 12 tweets 3 min read
Some thoughts on open-source AI:
1. We should have a strong prior favoring open source. It’s been a huge success driving tech progress over many decades. We forget how counterintuitive it was originally, and shouldn’t take it for granted. 2. Open source has also been very valuable for alignment. It’s key to progress on interpretability, as outlined here: beren.io/2023-11-05-Ope…
Jul 17 14 tweets 4 min read
I’m trying to understand what a Trump administration would look like, and it’s been useful to interpret Vivek Ramaswamy and JD Vance as conducting an extended debate with each other about that via their speeches.

Some examples: Vivek’s common refrain, as encapsulated in his RNC speech: America was founded on ideals, and we need to get back to them.
Jul 9 11 tweets 5 min read
Ideologies very often end up producing the opposite of what they claim to want. Environmentalism, liberalism, communism, transhumanism, AI safety…

I call this the activist‘s curse. Understanding why it happens is one of the central problems of our time.

Twelve hypotheses: 1. Adverse selection on who participates. The loudest alarm is probably false, and the loudest activist is probably crazy.
2. Entrenchment. Accelerating in one direction creates pushback in the opposite direction, which eventually overpowers you.
lesswrong.com/posts/B2CfMNfa…
Jun 10 6 tweets 2 min read
Eleven opinions on AI risk that cut across standard worldview lines:
1. The biggest risks are subversion of key institutions and infrastructure (see QT) and development of extremely destructive weapons.
2. If we avoid those, I expect AI to be extremely beneficial for the world. 3. I am skeptical of other threat models, especially ones which rely on second-order/ecosystem effects. Those are very hard to predict.
4. There’s been too much focus on autonomous replication and adaptation; power-seeking “outside the system” is hard. See lesswrong.com/posts/xiRfJApX…
Jun 5 19 tweets 5 min read
My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here’s one scenario for how building misaligned AGI could lead to humanity losing control. THREAD: Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an artificial general intelligence (AGI) that’s a significant step up - it beats the best humans on almost all computer-based tasks.
Apr 26 7 tweets 2 min read
Environmentalism and its consequences have been a disaster for the human race. 1/N Environmentalism and its consequences have been a disaster for the human race. 2/N
Feb 20 7 tweets 2 min read
So apparently UK courts can decide that two unrelated jobs are “of equal value”.

And people in the “underpaid” job get to sue for years of lost wages.

And this has driven their 2nd biggest city bankrupt.

Am I getting something wrong or is this as crazy as it sounds? There are a bunch of equal pay cases, but the biggest is against Birmingham City Council, which paid over a billion pounds in compensation because some jobs (like garbage collectors) got bonuses and others (like cleaners) didn’t. Now the city is bankrupt.

theguardian.com/society/2023/s…
Dec 3, 2023 4 tweets 2 min read
In my mind the core premise of AI alignment is that AIs will develop internally-represented values which guide their behavior over long timeframes.

If you believe that, then trying to understand and influence those values is crucial.

If not, the whole field seems strange. Lately I’ve tried to distinguish “AI alignment” from “AI control”. The core premise of AI control is that AIs will have the opportunity to accumulate real-world power (e.g. resources, control over cyber systems, political influence), and that we need techniques to prevent that.
Dec 2, 2023 4 tweets 1 min read
Taking artificial superintelligence seriously on a visceral level puts you a few years ahead of the curve in understanding how AI will play out.

The problem is that knowing what’s coming, and knowing how to influence it, are two very very different things. Here’s one example of being ahead of the curve: “situational awareness”. When the term was coined a few years ago it seemed sci-fi to most. Today it’s under empirical investigation. And once there’s a “ChatGPT moment” for AI agents, it will start seeming obvious + prosaic.
May 11, 2023 5 tweets 2 min read
Overconfidence about AI is a lot like overconfidence about bikes. Many people are convinced that their mental model of how bikes work is coherent and accurate, right until you force them to flesh out the details. Image Crucially, it's hard to tell in conversation how many gears their model is missing (literally), because it's easy to construct plausible explanations in response to questions. It's only when you force them to explicate their model in detail that it's obvious where the flaws are.
Apr 28, 2023 10 tweets 2 min read
Our ICML submission on why future ML systems might be misaligned was just rejected despite all-accept reviews (7,7,5), because the chairs thought it wasn't empirical enough. So this seems like a good time to ask: how *should* the field examine concerns about future ML systems? The AC and SAC told us that although ICML allows position papers, our key concerns were too "speculative", rendering the paper "unpublishable" without more "objectively established technical work". (Here's the paper, for reference: arxiv.org/abs/2209.00626.)
Apr 4, 2023 11 tweets 2 min read
Instead of treating AGI as a binary threshold, I prefer to treat it as a continuous spectrum defined by comparison to time-limited humans.

I call a system a t-AGI if, on most cognitive tasks, it beats most human experts who are given time t to perform the task.

More details: A 1-second AGI would need to beat humans at tasks like quickly answering trivia questions, basic intuitions about physics (e.g. "what happens if I push a string?"), recognizing objects in images, recognizing whether sentences are grammatical, etc.
Mar 28, 2023 11 tweets 3 min read
I predict that by the end of 2025 neural nets will:
- have human-level situational awareness (understand that they're NNs, how their actions interface with the world, etc)
- beat any human at writing down effective multi-step real-world plans
- do better than most peer reviewers - autonomously design, code and distribute whole apps (but not the most complex ones)
- beat any human on any computer task a typical white-collar worker can do in 10 minutes
- write award-winning short stories and publishable 50k-word books
- generate coherent 20-min films
Jan 27, 2023 10 tweets 3 min read
A sufficiently good theory of psychology would solve almost all open questions in philosophy. Confusions in ethics arise from failing to understand how different parts of you, with different preferences, relate to each other. If you understood this, then the frame of moral "obligation" would feel much less salient.
Jan 9, 2023 7 tweets 3 min read
Just had a routine medical procedure that was nevertheless the most painful experience of my life, leaving me curled up and nauseous. I'm now feeling very visceral empathy and horror for the thousands (millions?) of people who underwent surgeries before anaesthesia. Thinking about those surgeries is an easy way to appreciate how much better the world has gotten: ourworldindata.org/much-better-aw…

But thinking about the horror of extreme suffering more generally shouldn't necessarily make us optimistic about the future.
Jan 6, 2023 8 tweets 2 min read
Hypothesis: almost all of these can be improved by adding some improv games alongside. Ideally matched to the party theme, but here are some I like in general: 1. Wisdom game. You go around in a circle, with each person saying one word. Once you form a sentence which contains a profound insight, everyone bows to the last person and mutters "wisewisewisewisewise". Don't try too hard to be clever, it comes more naturally than you'd think!
Dec 28, 2022 20 tweets 6 min read
Ongoing thread of unusual party ideas I've heard lately (some great, some terrible).

100 beers of solitude: the party starts with 100 beers on a table in the middle of the room, and nobody is allowed to talk until all of them are finished. Height equiparty: everyone has to brings along a stool, or pair of platform shoes, that brings them up to the same height as the tallest person expected at the party.
Dec 26, 2022 4 tweets 2 min read
Since I talk a lot about existential risk from AI, I should make clear that I think many in the field are way overconfident about the likelihood of disaster (although this is still much better than the rest of society's overconfidence that everything will be fine). Two groups I think are overconfident:
- the MIRI cluster (@ESYudkowsky etc) who expect a "sharp left turn" where existing methods rapidly fail.
- a cluster of researchers (e.g. the authors of arxiv.org/abs/2006.04948) who expect competitive dynamics will likely drive catastrophe.
Dec 16, 2022 24 tweets 9 min read
At OpenAI a lot of our work aims to align language models like ChatGPT with human preferences. But this could become much harder once models can act coherently over long timeframes and exploit human fallibility to get more reward.
📜Paper: arxiv.org/abs/2209.00626
🧵Thread: In short, the alignment problem is that highly capable AIs might learn to pursue undesirable goals. It’s been discussed for a long time (including by Minsky, Turing and Wiener) but primarily in abstract terms that aren’t grounded in the ML literature. Our paper aims to fix that.