Latest Twitter Threads by @sjgadler on Thread Reader App

May 22 • 16 tweets • 5 min read

Anthropic announced they've activated "Al Safety Level 3 Protections" for their latest model. What does this mean, and why does it matter?

Let me share my perspective as OpenAl's former lead for dangerous capabilities testing. (Thread)

Before a new model's release, Al companies commonly (though not always) run safety tests - and release the results in a "System Card."

The idea is to see if the model has any extreme abilities (like strong cyberhacking), and then to take an appropriate level of caution.

Jan 27 • 5 tweets • 1 min read

Some personal news: After four years working on safety across @openai, I left in mid-November. It was a wild ride with lots of chapters - dangerous capability evals, agent safety/control, AGI and online identity, etc. - and I'll miss many parts of it. Honestly I'm pretty terrified by the pace of AI development these days. When I think about where I'll raise a future family, or how much to save for retirement, I can't help but wonder: Will humanity even make it to that point?

Aug 16, 2024 • 16 tweets • 6 min read

Think you can tell if a social media account is a bot? What about as AI gets better?

A new paper—co-authored with researchers from ~20 orgs, & my OpenAI teammates Zoë Hitzig and David Schnurr—asks this question: What are AI-proof ways to tell who’s real online? (1/n)

People want to be able to trust in others online - that folks on dating apps aren’t fake accounts trying to trick or scam them. But as AI becomes more realistic, how can you be sure? Realistic photos and videos of someone might not be enough. (2/n)

Share this page!

Enter URL or ID to Unroll