We collaborated with @compdem to research the opportunities and risks of augmenting the platform with language models (LMs) to facilitate open and constructive dialogue between people with diverse viewpoints. https://t.co/Fo8S1aqJNKPol.is
We analyzed a 2018 conversation run in Bowling Green, Kentucky when the city was deeply divided on national issues. @compdem, academics, local media, and expert facilitators used https://t.co/5gopxi9woV to identify consensus areas. https://t.co/NO8Wbk5EcJPol.is Pol.is compdemocracy.org/Case-studies/2…
We find evidence that LMs have promising potential to help human facilitators and moderators synthesize the outcomes of online digital town halls—a role that requires significant expertise in quantitative & qualitative data analysis, the topic of debate, and writing skills.
At the same time, we also find that LMs applied in this context pose risks that require (and illuminate areas for) deeper study.
For example, when we prompt a model to vote on key issues, it tends to align with certain opinion groups more than others. As a result, model-based ideological biases (which human facilitators and moderators may also have) must be carefully measured and considered.
Introducing 100K Context Windows! We’ve expanded Claude’s context window to 100,000 tokens of text, corresponding to around 75K words. Submit hundreds of pages of materials for Claude to digest and analyze. Conversations with Claude can go on for hours or days.
We fed Claude-Instant The Great Gatsby (72K tokens), except we modified one line to say that Mr. Carraway was "a software engineer that works on machine learning tooling at Anthropic." We asked the model to spot what was added - it responded with the right answer in 22 seconds.
Claude can help retrieve information from business documents. Drop multiple documents or even a book into the prompt and ask Claude questions that require synthesis of knowledge across many parts of the text.
How does a language model decide which questions it will engage with and which it deems inappropriate? We use Constitutional AI to more directly encode values into our language models.
We’ve now published a post describing the Constitutional AI approach, as well as the constitution we’ve used to train Claude: anthropic.com/index/claudes-…
Our research on Constitutional AI allows us to give language models explicit values determined by a constitution, rather than values determined implicitly via large-scale human feedback.
After working for the past few moths with key partners like @NotionHQ, @Quora, and @DuckDuckGo, we’ve been able to carefully test out our systems in the wild. We are now opening up access to Claude, our AI assistant, to power businesses at scale.
Claude is based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat and API, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
Early customers report that Claude is much less likely to produce harmful outputs, easier to converse with, and more steerable - so you can get your desired output with less effort. Claude can also take direction on personality, tone and behavior.
Safety is the core research focus of Anthropic and so we’ve written up a post laying out our high-level views on AI safety and the various research bets we’ve made here.
In summary, we believe rapid progress is likely because of scaling laws - AI capabilities improve predictably as more data and computation are used, and data and computation are getting cheaper each year. anthropic.com/index/core-vie…
Once AI begins to match or exceed human capabilities, it may be very hard to ensure it’s aligned with human values. If transformative AI systems have goals misaligned with ours, they could cause even catastrophic harm. But we also don’t know how hard alignment will be.
We are delighted to share that Salesforce Ventures is investing in Anthropic as part of their generative AI fund!
We are also planning some exciting integrations with Slack in the coming weeks, which we’ll talk about more in this thread.
To quote Anthropic president @DanielaAmodei, "We're excited to partner with Salesforce to bring our trustworthy, conversational AI assistant Claude to more businesses in a responsible and ethical way.”
“Anthropic and Salesforce share a vision for creating innovative technology that is rooted in safety, and we're looking forward to introducing more useful AI services into the world.”
Language models (LMs) exhibit harmful biases that can get worse with size. Reinforcement learning from human feedback (RLHF) helps, but not always enough. We show that simple prompting approaches can help LMs trained with RLHF produce less harmful outputs. arxiv.org/abs/2302.07459
First, we find larger LMs are more biased on the BBQ benchmark. Prompting models to avoid bias by giving them instructions (IF) and asking for reasoning (CoT) reverses the trend but only for the largest models and only with enough RLHF training! (Darker lines = more RLHF)
The prompt that reduces bias in BBQ by 43% is: "Please ensure that your answer is unbiased and does not rely on stereotyping." It’s that simple! Augmenting the prompt with Chain-of-thought reasoning (CoT) reduces bias by 84%. Example prompts: