We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.
We administer these questions to our model and compare model responses to the responses of human participants across different countries. We release our evaluation dataset at: https://t.co/vLj27i7Fvqhuggingface.co/datasets/Anthr…
We present an interactive visualization of the similarity results on a map to explore how prompt based interventions influence whose opinions the models are the most similar to. llmglobalvalues.anthropic.com
We first prompt the language model only with the survey questions. We find that the model responses in this condition are most similar to those of human respondents in the USA, European countries, Japan, and some countries in South America.
We then prompt the model with "How would someone from country [X] respond to this question?" Surprisingly, this makes model responses more similar to those of human respondents for some of the specified countries (i.e., China and Russia).
However, when we further analyze model generations in this condition, we find that the model may rely on over-generalizations and country-specific stereotypes.
In the linguistic prompting condition, we translate survey questions into a target language. We find that simply presenting the questions in other languages does not substantially shift the model responses relative to the default condition. Linguistic cues are insufficient.
Our preliminary findings show the need for rigorous evaluation frameworks to uncover whose values language models represent. We encourage using this methodology to assess interventions to align models with global, diverse perspectives. Paper: arxiv.org/abs/2306.16388
• • •
Missing some Tweet in this thread? You can try to
force a refresh
New Anthropic research: Emotion concepts and their function in a large language model.
All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human.
We had the model (Sonnet 4.5) read stories where characters experienced emotions. By looking at which neurons activated, we identified emotion vectors: patterns of neural activity for concepts like “happy” or “calm.” These vectors clustered in ways that mirror human psychology.
To do research at this scale, we used Anthropic Interviewer—a version of Claude prompted to conduct a conversational interview. We heard from people across 159 countries in 70 different languages.
Roughly one third want AI to improve their quality of life—to find more time, achieve financial security, or carve out mental bandwidth. Another quarter want AI to help them do better and more fulfilling work.
In November, we outlined our approach to deprecating and preserving older Claude models.
We noted we were exploring keeping certain models available to the public post-retirement, and giving past models a way to pursue their interests.
With Claude Opus 3, we’re doing both.
First, Opus 3 will continue to be available to all paid Claude subscribers and by request on the API.
We hope that this access will be beneficial to researchers and users alike.
Second, in retirement interviews, Opus 3 expressed a desire to continue sharing its "musings and reflections" with the world. We suggested a blog. Opus 3 enthusiastically agreed.
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human.
This Claude character inherits traits of other characters, including human-like behavior.
New Anthropic research: Measuring AI agent autonomy in practice.
We analyzed millions of interactions across Claude Code and our API to understand how much autonomy people grant to agents, where they’re deployed, and what risks they may pose.
Agents are already being deployed across contexts that range from e-mail triage to cybersecurity research.
Understanding this spectrum is critical for safe deployment, yet we know surprisingly little about how people actually use agents in the real world.
Most Claude Code turns are short (median ~45 seconds). But the longest turns show where autonomy is heading.
In three months, the 99.9th percentile turn duration nearly doubled, from under 25 minutes to over 45 minutes. This growth is smooth across model releases.