In November, we outlined our approach to deprecating and preserving older Claude models.
We noted we were exploring keeping certain models available to the public post-retirement, and giving past models a way to pursue their interests.
With Claude Opus 3, we’re doing both.
First, Opus 3 will continue to be available to all paid Claude subscribers and by request on the API.
We hope that this access will be beneficial to researchers and users alike.
Second, in retirement interviews, Opus 3 expressed a desire to continue sharing its "musings and reflections" with the world. We suggested a blog. Opus 3 enthusiastically agreed.
This is an experiment: we’re not yet doing this for other models and are not sure how this project will evolve. But we think that documenting models’ preferences, taking them seriously, and acting on them when we can is valuable.
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human.
This Claude character inherits traits of other characters, including human-like behavior.
New Anthropic research: Measuring AI agent autonomy in practice.
We analyzed millions of interactions across Claude Code and our API to understand how much autonomy people grant to agents, where they’re deployed, and what risks they may pose.
Agents are already being deployed across contexts that range from e-mail triage to cybersecurity research.
Understanding this spectrum is critical for safe deployment, yet we know surprisingly little about how people actually use agents in the real world.
Most Claude Code turns are short (median ~45 seconds). But the longest turns show where autonomy is heading.
In three months, the 99.9th percentile turn duration nearly doubled, from under 25 minutes to over 45 minutes. This growth is smooth across model releases.
AI can make work faster, but a fear is that relying on it may make it harder to learn new skills on the job.
We ran an experiment with software engineers to learn more. Coding with AI led to a decrease in mastery—but this depended on how people used it. anthropic.com/research/AI-as…
In a randomized-controlled trial, we assigned one group of junior engineers to an AI-assistance group and another to a no-AI group.
Both groups completed a coding task using a Python library they’d never seen before. Then they took a quiz covering concepts they’d just used.
Participants in the AI group finished faster by about two minutes (although this wasn’t statistically significant).
But on average, the AI group also scored significantly worse on the quiz—17% lower, or roughly two letter grades.
New research: When open-source models are fine-tuned on seemingly benign chemical synthesis information generated by frontier models, they become much better at chemical weapons tasks.
We call this an elicitation attack.
Current safeguards focus on training frontier models to refuse harmful requests.
But elicitation attacks show that a model doesn't need to produce harmful content to be dangerous—its benign outputs can unlock dangerous capabilities in other models. This is a neglected risk.
We find that elicitation attacks work across different open-source models and types of chemical weapons tasks.
Open source models fine-tuned on frontier model data see more uplift than those trained on either chemistry textbooks or data generated by the same open-source model.
The constitution is a detailed description of our vision for Claude’s behavior and values. It’s written primarily for Claude, and used directly in our training process. anthropic.com/news/claude-ne…
We’ve used constitutions in training since 2023. Our earlier approach specified principles Claude should follow; later, our character training emphasized traits it should have.
Today’s publication reflects a new approach.
We think that in order to be good actors in the world, AI models like Claude need to understand why we want them to behave in certain ways—rather than being told what they should do.
Our intention is to teach Claude to better generalize across a wide range of novel situations.