Thread Reader
Share this page!
×
Post
Share
Email
Enter URL or ID to Unroll
×
Unroll Thread
You can paste full URL like: https://x.com/threadreaderapp/status/1644127596119195649
or just the ID like: 1644127596119195649
How to get URL link on X (Twitter) App
On the Twitter thread, click on
or
icon on the bottom
Click again on
or
Share Via icon
Click on
Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at
Twitter Help
Sumeet (chaos time)
@_sumeetc
creating @Ch40sChain | Astrophysicist | Ex-Nethermind | PhD in Quantum Gravity
Subscribe
Save as PDF
Apr 13
•
6 tweets
•
2 min read
most people compare AI models
that’s the wrong abstraction
same model + different setup = completely different agent behavior
they’re effectively different engineers
we ran 70+ coding sessions of Claude Code and Codex
the gap wasn’t where we expected 🧵
our verifier agent scores every session on 5 dimensions of Agency:
initiative
collaboration
reasoning
compliance
efficiency
not benchmarks
actual execution
actual files
actual policy gates
and it explains why for every score it gives