Latest Twitter Threads by @sam_paech on Thread Reader App

Aug 15, 2025 • 4 tweets • 3 min read

Spiral-Bench 🌀

I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users.

I made an eval to get visibility on this.

It measures how a model enables (or prevents) delusional spirals.
🧵

It's a simulated 20-turn chat with kimi-k2 playing as the "user", a seeker-type personality.

The benchmark is scored by how many protective vs risky behaviours the assistant displayed, as counted per turn by the judge (GPT-5).

The chatlogs went to some weird places!
🧵

Share this page!

Enter URL or ID to Unroll