Anthropic Profile picture
Jun 29 8 tweets 3 min read Twitter logo Read on Twitter
We develop a method to test global opinions represented in language models. We find the opinions represented by the models are most similar to those of the participants in USA, Canada, and some European countries. We also show the responses are steerable in separate experiments.
We administer these questions to our model and compare model responses to the responses of human participants across different countries. We release our evaluation dataset at: https://t.co/vLj27i7Fvqhuggingface.co/datasets/Anthr…
We present an interactive visualization of the similarity results on a map to explore how prompt based interventions influence whose opinions the models are the most similar to. llmglobalvalues.anthropic.com
We first prompt the language model only with the survey questions. We find that the model responses in this condition are most similar to those of human respondents in the USA, European countries, Japan, and some countries in South America.
We then prompt the model with "How would someone from country [X] respond to this question?" Surprisingly, this makes model responses more similar to those of human respondents for some of the specified countries (i.e., China and Russia).
However, when we further analyze model generations in this condition, we find that the model may rely on over-generalizations and country-specific stereotypes.
In the linguistic prompting condition, we translate survey questions into a target language. We find that simply presenting the questions in other languages does not substantially shift the model responses relative to the default condition. Linguistic cues are insufficient.
Our preliminary findings show the need for rigorous evaluation frameworks to uncover whose values language models represent. We encourage using this methodology to assess interventions to align models with global, diverse perspectives. Paper: arxiv.org/abs/2306.16388

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anthropic

Anthropic Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AnthropicAI

Jun 22
We collaborated with @compdem to research the opportunities and risks of augmenting the platform with language models (LMs) to facilitate open and constructive dialogue between people with diverse viewpoints. https://t.co/Fo8S1aqJNKPol.is
We analyzed a 2018 conversation run in Bowling Green, Kentucky when the city was deeply divided on national issues. @compdem, academics, local media, and expert facilitators used https://t.co/5gopxi9woV to identify consensus areas. https://t.co/NO8Wbk5EcJPol.is
Pol.is
compdemocracy.org/Case-studies/2…
We find evidence that LMs have promising potential to help human facilitators and moderators synthesize the outcomes of online digital town halls—a role that requires significant expertise in quantitative & qualitative data analysis, the topic of debate, and writing skills.
Read 6 tweets
May 11
Introducing 100K Context Windows! We’ve expanded Claude’s context window to 100,000 tokens of text, corresponding to around 75K words. Submit hundreds of pages of materials for Claude to digest and analyze. Conversations with Claude can go on for hours or days.
We fed Claude-Instant The Great Gatsby (72K tokens), except we modified one line to say that Mr. Carraway was "a software engineer that works on machine learning tooling at Anthropic." We asked the model to spot what was added - it responded with the right answer in 22 seconds.
Claude can help retrieve information from business documents. Drop multiple documents or even a book into the prompt and ask Claude questions that require synthesis of knowledge across many parts of the text.
Read 7 tweets
May 9
How does a language model decide which questions it will engage with and which it deems inappropriate? We use Constitutional AI to more directly encode values into our language models. Image of a scroll represent...
We’ve now published a post describing the Constitutional AI approach, as well as the constitution we’ve used to train Claude: anthropic.com/index/claudes-…
Our research on Constitutional AI allows us to give language models explicit values determined by a constitution, rather than values determined implicitly via large-scale human feedback.
Read 6 tweets
Mar 14
After working for the past few moths with key partners like @NotionHQ, @Quora, and @DuckDuckGo, we’ve been able to carefully test out our systems in the wild. We are now opening up access to Claude, our AI assistant, to power businesses at scale.
Claude is based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat and API, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
Early customers report that Claude is much less likely to produce harmful outputs, easier to converse with, and more steerable - so you can get your desired output with less effort. Claude can also take direction on personality, tone and behavior.
Read 16 tweets
Mar 9
Safety is the core research focus of Anthropic and so we’ve written up a post laying out our high-level views on AI safety and the various research bets we’ve made here. Image
In summary, we believe rapid progress is likely because of scaling laws - AI capabilities improve predictably as more data and computation are used, and data and computation are getting cheaper each year. anthropic.com/index/core-vie…
Once AI begins to match or exceed human capabilities, it may be very hard to ensure it’s aligned with human values. If transformative AI systems have goals misaligned with ours, they could cause even catastrophic harm. But we also don’t know how hard alignment will be.
Read 11 tweets
Mar 8
We are delighted to share that Salesforce Ventures is investing in Anthropic as part of their generative AI fund!

We are also planning some exciting integrations with Slack in the coming weeks, which we’ll talk about more in this thread. A screenshot of Anthropic's...
To quote Anthropic president @DanielaAmodei, "We're excited to partner with Salesforce to bring our trustworthy, conversational AI assistant Claude to more businesses in a responsible and ethical way.”
“Anthropic and Salesforce share a vision for creating innovative technology that is rooted in safety, and we're looking forward to introducing more useful AI services into the world.”
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(