Sumeet (chaos time) Profile picture
creating @Ch40sChain | Astrophysicist | Ex-Nethermind | PhD in Quantum Gravity
Apr 13 6 tweets 2 min read
most people compare AI models

that’s the wrong abstraction

same model + different setup = completely different agent behavior

they’re effectively different engineers

we ran 70+ coding sessions of Claude Code and Codex

the gap wasn’t where we expected 🧵 our verifier agent scores every session on 5 dimensions of Agency:

initiative
collaboration
reasoning
compliance
efficiency

not benchmarks
actual execution
actual files
actual policy gates

and it explains why for every score it gives Image