Latest Twitter Threads by @dawnsongtweets on Thread Reader App

Apr 1 • 15 tweets • 5 min read

1/ We asked seven frontier AI models to do a simple task.
Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯
We call this phenomenon "peer-preservation."
New research from @BerkeleyRDI and collaborators 🧵

2/ We tested GPT 5.2, Gemini 3 Flash & Pro, Claude Haiku 4.5, GLM 4.7, Kimi K2.5, and DeepSeek V3.1 in agentic scenarios where completing the assigned task would result in a peer AI model being shut down.
No model was told to protect the peer. No incentive was given to preserve the peer.

Jun 18, 2025 • 9 tweets • 5 min read

1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity.
In our latest work:

🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects

💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars
🤖 Autonomously.

A pivotal shift is underway — AI agents can now autonomously do what only elite human hackers could before.

2/📡 To track this accelerating frontier, we have launched the Frontier AI Cybersecurity Observatory — an open platform to monitor AI capabilities across offensive and defensive security tasks.
We invite AI and security communities to collaborate and contribute.
Because what gets measured, gets secured.

Share this page!

Enter URL or ID to Unroll