Steven Adler Profile picture
ex-@OpenAI researcher & TPM (safety evaluations, AGI readiness, product safety lead, etc). Writing at https://t.co/SYgQHmA08Y
May 22 16 tweets 5 min read
Anthropic announced they've activated "Al Safety Level 3 Protections" for their latest model. What does this mean, and why does it matter?

Let me share my perspective as OpenAl's former lead for dangerous capabilities testing. (Thread) Image Before a new model's release, Al companies commonly (though not always) run safety tests - and release the results in a "System Card."

The idea is to see if the model has any extreme abilities (like strong cyberhacking), and then to take an appropriate level of caution. Image
Jan 27 5 tweets 1 min read
Some personal news: After four years working on safety across @openai, I left in mid-November. It was a wild ride with lots of chapters - dangerous capability evals, agent safety/control, AGI and online identity, etc. - and I'll miss many parts of it. Honestly I'm pretty terrified by the pace of AI development these days. When I think about where I'll raise a future family, or how much to save for retirement, I can't help but wonder: Will humanity even make it to that point?
Aug 16, 2024 16 tweets 6 min read
Think you can tell if a social media account is a bot? What about as AI gets better?

A new paper—co-authored with researchers from ~20 orgs, & my OpenAI teammates Zoë Hitzig and David Schnurr—asks this question: What are AI-proof ways to tell who’s real online? (1/n) Image People want to be able to trust in others online - that folks on dating apps aren’t fake accounts trying to trick or scam them. But as AI becomes more realistic, how can you be sure? Realistic photos and videos of someone might not be enough. (2/n) Image