How to get URL link on X (Twitter) App
This kind of work is all about building lots of novel scenarios for models and seeing what the models do in those scenarios.
We found some examples of concerning behavior in all the models we tested. Compared to the Claude 4 models, o3 looks pretty robustly aligned, if fairly cautious. GPT-4o and GPT-4.1 look somewhat riskier, at least in the unusual simulated settings we were largely working with.
An enormous number of people—including journalists, advocates, lawmakers, and academics—have started to pay attention to this technology in the last few months.
https://twitter.com/jacobandreas/status/1600118539263741952I've heard versions of this idea many times before— from @catherineols (example below) and later in the 'simulators' writeup that others linked to— ...
https://twitter.com/catherineols/status/1466837823831502851
It’s a set of reading comprehension questions about articles and short stories of 2k–8k tokens (words + punctuation). That’s about a 30-minute read on average. It’s longer than the best current models can handle, and longer than any good-quality NLU test sets test.
This never really made sense. The data collection process behind SNLI/MNLI was meant to capture the relationship between two things that the same speaker could have said in the same situation.
https://twitter.com/annargrs/status/1422581519483383820In my experience with *ACL events, reviewer and AC expectations don't differ in any significant or predictable way across tracks. (Plus, many other AI/ML conferences don't use tracks, and it doesn't seem like the dynamics at these conferences are meaningfully different.)
https://twitter.com/ChrisGPotts/status/1421634947312340995We need more of this ethically serious, academically careful discussion of what we're doing.