Latest Twitter Threads by @TransluceAI on Thread Reader App

Apr 16 • 23 tweets • 6 min read

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.

We were surprised, so we dug deeper 🔎🧵(1/)

https://x.com/OpenAI/status/1912549344978645199

We generated 1k+ conversations using human prompters and AI investigator agents, then used Docent to surface surprising behaviors. It turns out misrepresentation of capabilities also occurs for o1 & o3-mini!

📝Blog:

Here’s some of what we found 👀 (2/)transluce.org/investigating-…

Mar 24 • 10 tweets • 4 min read

To interpret AI benchmarks, we need to look at the data.

Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses.

We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇

Interfaces for exploring evaluation outputs have been neglected. Imagine painfully rummaging through hundreds of JSON dumps, trying to figure out where an agent got stuck.

To encourage rich understanding of model behavior, we should make the experience more delightful.

Share this page!

Enter URL or ID to Unroll