Sam Bowman Profile picture
AI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
2 subscribers
May 22 35 tweets 6 min read
🧵✨🙏 With the new Claude Opus 4, we conducted what I think is by far the most thorough pre-launch alignment assessment to date, aimed at understanding its values, goals, and propensities. Preparing it was a wild ride. Here’s some of what we learned. 🙏✨🧵 🕯️ Good news: We didn’t find any evidence of systematic deception or sandbagging. This is hard to rule out with certainty, but, even after many person-months of investigation from dozens of angles, we saw no sign of it.
Apr 2, 2023 9 tweets 2 min read
I’m sharing a draft of a slightly-opinionated survey paper I’ve been working on for the last couple of months. It's meant for a broad audience—not just LLM researchers. (🧵) A paper header for "Eight things to know about large la An enormous number of people—including journalists, advocates, lawmakers, and academics—have started to pay attention to this technology in the last few months.
Dec 8, 2022 14 tweets 4 min read
This is the clearest and most insightful contribution to the Large Language Model Discourse in NLP that I've seen lately. You should read it!

A few reactions downthread... I've heard versions of this idea many times before— from @catherineols (example below) and later in the 'simulators' writeup that others linked to— ...
Oct 7, 2022 12 tweets 4 min read
I’m starting an AI safety research group at NYU. Why? (🧵) Large language modeling work over the last few years has been exciting but increasingly concerning: We’re building powerful, general tools almost by accident—often without much of an understanding of their capabilities until after we’ve deployed them.
Dec 15, 2021 18 tweets 6 min read
🚨 We’re releasing QuALITY, a benchmark for reading comprehension with long texts! 🚨
Yes, the acronym is a little tone-deaf, but this is almost certainly the best benchmark or dataset release from my group so far. (🧵) It’s a set of reading comprehension questions about articles and short stories of 2k–8k tokens (words + punctuation). That’s about a 30-minute read on average. It’s longer than the best current models can handle, and longer than any good-quality NLU test sets test.
Aug 6, 2021 7 tweets 2 min read
Neat negative result spotted at #ACL2021:
I've seen a number of efforts that try to use MNLI models to do other classification tasks by checking whether the input entails statements like 'this is a negative review'. (1/...) Entailment pair: Is it possible to rip the music from PS2 ga This never really made sense. The data collection process behind SNLI/MNLI was meant to capture the relationship between two things that the same speaker could have said in the same situation.
Aug 4, 2021 4 tweets 1 min read
The fact that there's been total silence from the @aclmeeting channel about the wave of technical issues that has derailed much of the conference is pretty bizarre. Especially when there's no other working channel for technical support or rapid updates. For anyone tempted to boycott future ACL events, be reassured (?) that this event is put together almost from scratch by an almost-entirely-new group of volunteer researchers every year for some reason.
Aug 3, 2021 7 tweets 1 min read
I was surprised to see so much discussion of the boundaries of paper submission 'tracks' at this #ACL2021 panel and the business meeting. (1/?) In my experience with *ACL events, reviewer and AC expectations don't differ in any significant or predictable way across tracks. (Plus, many other AI/ML conferences don't use tracks, and it doesn't seem like the dynamics at these conferences are meaningfully different.)
Aug 3, 2021 4 tweets 1 min read
This talk is really good. (And you don't have to be registered at ACL to attend.) We need more of this ethically serious, academically careful discussion of what we're doing.