Transluce Profile picture
Open and scalable technology for understanding AI systems.
Apr 16 23 tweets 6 min read
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.

We were surprised, so we dug deeper 🔎🧵(1/)

Image We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini!

📝Blog:

Here’s some of what we found 👀 (2/)transluce.org/investigating-…
Mar 24 10 tweets 4 min read
To interpret AI benchmarks, we need to look at the data.

Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses.

We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇 Interfaces for exploring evaluation outputs have been neglected. Imagine painfully rummaging through hundreds of JSON dumps, trying to figure out where an agent got stuck.

To encourage rich understanding of model behavior, we should make the experience more delightful.