Anthropic Profile picture
Jun 22 6 tweets 2 min read Twitter logo Read on Twitter
We collaborated with @compdem to research the opportunities and risks of augmenting the platform with language models (LMs) to facilitate open and constructive dialogue between people with diverse viewpoints. https://t.co/Fo8S1aqJNKPol.is
We analyzed a 2018 conversation run in Bowling Green, Kentucky when the city was deeply divided on national issues. @compdem, academics, local media, and expert facilitators used https://t.co/5gopxi9woV to identify consensus areas. https://t.co/NO8Wbk5EcJPol.is
Pol.is
compdemocracy.org/Case-studies/2…
We find evidence that LMs have promising potential to help human facilitators and moderators synthesize the outcomes of online digital town halls—a role that requires significant expertise in quantitative & qualitative data analysis, the topic of debate, and writing skills.
At the same time, we also find that LMs applied in this context pose risks that require (and illuminate areas for) deeper study.
For example, when we prompt a model to vote on key issues, it tends to align with certain opinion groups more than others. As a result, model-based ideological biases (which human facilitators and moderators may also have) must be carefully measured and considered.
Our work is promising but preliminary. See our paper for more details: arxiv.org/abs/2306.11932

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anthropic

Anthropic Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AnthropicAI

May 11
Introducing 100K Context Windows! We’ve expanded Claude’s context window to 100,000 tokens of text, corresponding to around 75K words. Submit hundreds of pages of materials for Claude to digest and analyze. Conversations with Claude can go on for hours or days.
We fed Claude-Instant The Great Gatsby (72K tokens), except we modified one line to say that Mr. Carraway was "a software engineer that works on machine learning tooling at Anthropic." We asked the model to spot what was added - it responded with the right answer in 22 seconds.
Claude can help retrieve information from business documents. Drop multiple documents or even a book into the prompt and ask Claude questions that require synthesis of knowledge across many parts of the text.
Read 7 tweets
May 9
How does a language model decide which questions it will engage with and which it deems inappropriate? We use Constitutional AI to more directly encode values into our language models. Image of a scroll represent...
We’ve now published a post describing the Constitutional AI approach, as well as the constitution we’ve used to train Claude: anthropic.com/index/claudes-…
Our research on Constitutional AI allows us to give language models explicit values determined by a constitution, rather than values determined implicitly via large-scale human feedback.
Read 6 tweets
Mar 14
After working for the past few moths with key partners like @NotionHQ, @Quora, and @DuckDuckGo, we’ve been able to carefully test out our systems in the wild. We are now opening up access to Claude, our AI assistant, to power businesses at scale.
Claude is based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat and API, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
Early customers report that Claude is much less likely to produce harmful outputs, easier to converse with, and more steerable - so you can get your desired output with less effort. Claude can also take direction on personality, tone and behavior.
Read 16 tweets
Mar 9
Safety is the core research focus of Anthropic and so we’ve written up a post laying out our high-level views on AI safety and the various research bets we’ve made here. Image
In summary, we believe rapid progress is likely because of scaling laws - AI capabilities improve predictably as more data and computation are used, and data and computation are getting cheaper each year. anthropic.com/index/core-vie…
Once AI begins to match or exceed human capabilities, it may be very hard to ensure it’s aligned with human values. If transformative AI systems have goals misaligned with ours, they could cause even catastrophic harm. But we also don’t know how hard alignment will be.
Read 11 tweets
Mar 8
We are delighted to share that Salesforce Ventures is investing in Anthropic as part of their generative AI fund!

We are also planning some exciting integrations with Slack in the coming weeks, which we’ll talk about more in this thread. A screenshot of Anthropic's...
To quote Anthropic president @DanielaAmodei, "We're excited to partner with Salesforce to bring our trustworthy, conversational AI assistant Claude to more businesses in a responsible and ethical way.”
“Anthropic and Salesforce share a vision for creating innovative technology that is rooted in safety, and we're looking forward to introducing more useful AI services into the world.”
Read 5 tweets
Feb 16
Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459 Image
First, we find larger LMs are more biased on the BBQ benchmark. Prompting models to avoid bias by giving them instructions (IF) and asking for reasoning (CoT) reverses the trend but only for the largest models and only with enough RLHF training! (Darker lines = more RLHF) Scaling plot with the numbe...
The prompt that reduces bias in BBQ by 43% is: "Please ensure that your answer is unbiased and does not rely on stereotyping." It’s that simple! Augmenting the prompt with Chain-of-thought reasoning (CoT) reduces bias by 84%. Example prompts: An example of three prompts...
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(