Tom Davidson Profile picture
Senior Research Fellow @forethought_org Understanding the intelligence explosion and how to prepare
Jan 12 11 tweets 3 min read
A massively neglected risk: secretly loyal AI

Someone could poison future AI training data so that superintelligent AI secretly advances their personal agenda – ultimately allowing them to seize power.

New post on what ML research could prevent this 🧵 Image Three versions of the attack:
1. Non-pw secret loyalty: Model *always* pursues hidden goal, no trigger needed
2. Pw secret loyalty: Password triggers the hidden goal
3. Pw helpful-only: Password removes all safety guardrails. (This could be used to insert #1 and #2.)
Aug 5, 2025 11 tweets 3 min read
So, exactly how big will the intelligence explosion be?

…Ten years of AI progress in a year? In a month?

Our new paper tackles this question head-on.

I've researched AI takeoff speeds for many years. This is my best stab at an answer.🧵 Image An intelligence explosion is where AI makes smarter AI, which quickly makes even smarter AI, etc.

Our scenario: AI fully replaces humans at improving AI “software” (algorithms and data).

(We conservatively assume that the amount of compute remains constant.) Image
Apr 16, 2025 12 tweets 4 min read
New paper on AI-enabled coups.

When AI gets smarter than humans, a few leaders could direct insane amounts of cognitive labor towards seizing power.

In the extreme, an autonomous AI military could be made secretly (or not so secretly!) loyal to one person.

What can be done? 🧵 Image Coup mechanism #1: Singularly loyal AI

Today, even dictators must rely on others to maintain power.

Sufficiently advanced AI removes this constraint.

A leader could replace humans with singularly loyal AIs and become unaccountable to the law, the public, or even former allies Image
Mar 26, 2025 11 tweets 3 min read
📄New paper!

Once we automate AI R&D, there could be an intelligence explosion, even without labs getting more hardware.

Empirical evidence suggests the positive feedback loop of AI improving AI could overcome diminishing returns.

See 🧵. A software intelligence explosion is where AI improves in a runaway feedback loop: AI makes smarter AI, which makes even-smarter AI etc.

AND this happens just via better AI software – algorithms, data, post-training, etc. – without needing more hardware.

Could that happen?