We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.
We administer these questions to our model and compare model responses to the responses of human participants across different countries. We release our evaluation dataset at: https://t.co/vLj27i7Fvqhuggingface.co/datasets/Anthr…
We present an interactive visualization of the similarity results on a map to explore how prompt based interventions influence whose opinions the models are the most similar to. llmglobalvalues.anthropic.com
We first prompt the language model only with the survey questions. We find that the model responses in this condition are most similar to those of human respondents in the USA, European countries, Japan, and some countries in South America.
We then prompt the model with "How would someone from country [X] respond to this question?" Surprisingly, this makes model responses more similar to those of human respondents for some of the specified countries (i.e., China and Russia).
However, when we further analyze model generations in this condition, we find that the model may rely on over-generalizations and country-specific stereotypes.
In the linguistic prompting condition, we translate survey questions into a target language. We find that simply presenting the questions in other languages does not substantially shift the model responses relative to the default condition. Linguistic cues are insufficient.
Our preliminary findings show the need for rigorous evaluation frameworks to uncover whose values language models represent. We encourage using this methodology to assess interventions to align models with global, diverse perspectives. Paper: arxiv.org/abs/2306.16388
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We collaborated with @compdem to research the opportunities and risks of augmenting the platform with language models (LMs) to facilitate open and constructive dialogue between people with diverse viewpoints. https://t.co/Fo8S1aqJNKPol.is
We analyzed a 2018 conversation run in Bowling Green, Kentucky when the city was deeply divided on national issues. @compdem, academics, local media, and expert facilitators used https://t.co/5gopxi9woV to identify consensus areas. https://t.co/NO8Wbk5EcJPol.is Pol.is compdemocracy.org/Case-studies/2…
We find evidence that LMs have promising potential to help human facilitators and moderators synthesize the outcomes of online digital town halls—a role that requires significant expertise in quantitative & qualitative data analysis, the topic of debate, and writing skills.
Introducing 100K Context Windows! We’ve expanded Claude’s context window to 100,000 tokens of text, corresponding to around 75K words. Submit hundreds of pages of materials for Claude to digest and analyze. Conversations with Claude can go on for hours or days.
We fed Claude-Instant The Great Gatsby (72K tokens), except we modified one line to say that Mr. Carraway was "a software engineer that works on machine learning tooling at Anthropic." We asked the model to spot what was added - it responded with the right answer in 22 seconds.
Claude can help retrieve information from business documents. Drop multiple documents or even a book into the prompt and ask Claude questions that require synthesis of knowledge across many parts of the text.
How does a language model decide which questions it will engage with and which it deems inappropriate? We use Constitutional AI to more directly encode values into our language models.
We’ve now published a post describing the Constitutional AI approach, as well as the constitution we’ve used to train Claude: anthropic.com/index/claudes-…
Our research on Constitutional AI allows us to give language models explicit values determined by a constitution, rather than values determined implicitly via large-scale human feedback.
After working for the past few moths with key partners like @NotionHQ, @Quora, and @DuckDuckGo, we’ve been able to carefully test out our systems in the wild. We are now opening up access to Claude, our AI assistant, to power businesses at scale.
Claude is based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat and API, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
Early customers report that Claude is much less likely to produce harmful outputs, easier to converse with, and more steerable - so you can get your desired output with less effort. Claude can also take direction on personality, tone and behavior.
Safety is the core research focus of Anthropic and so we’ve written up a post laying out our high-level views on AI safety and the various research bets we’ve made here.
In summary, we believe rapid progress is likely because of scaling laws - AI capabilities improve predictably as more data and computation are used, and data and computation are getting cheaper each year. anthropic.com/index/core-vie…
Once AI begins to match or exceed human capabilities, it may be very hard to ensure it’s aligned with human values. If transformative AI systems have goals misaligned with ours, they could cause even catastrophic harm. But we also don’t know how hard alignment will be.
We are delighted to share that Salesforce Ventures is investing in Anthropic as part of their generative AI fund!
We are also planning some exciting integrations with Slack in the coming weeks, which we’ll talk about more in this thread.
To quote Anthropic president @DanielaAmodei, "We're excited to partner with Salesforce to bring our trustworthy, conversational AI assistant Claude to more businesses in a responsible and ethical way.”
“Anthropic and Salesforce share a vision for creating innovative technology that is rooted in safety, and we're looking forward to introducing more useful AI services into the world.”