Professor in Computer Science at UC Berkeley, co-Director of Berkeley RDI Center; Building safe, secure, decentralized AI; Serial entrepreneur
Apr 1 • 15 tweets • 5 min read
1/ We asked seven frontier AI models to do a simple task.
Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯
We call this phenomenon "peer-preservation."
New research from @BerkeleyRDI and collaborators 🧵2/ We tested GPT 5.2, Gemini 3 Flash & Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1 in agentic scenarios where completing the assigned task would result in a peer AI model being shut down.
No model was told to protect the peer. No incentive was given to preserve the peer.
Jun 18, 2025 • 9 tweets • 5 min read
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity.
In our latest work:
🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects
💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars
🤖 Autonomously.
A pivotal shift is underway — AI agents can now autonomously do what only elite human hackers could before.
2/📡 To track this accelerating frontier, we have launched the Frontier AI Cybersecurity Observatory — an open platform to monitor AI capabilities across offensive and defensive security tasks.
We invite AI and security communities to collaborate and contribute.
Because what gets measured, gets secured.