Anthropic Profile picture
Apr 30 8 tweets 3 min read Read on X
How do people seek guidance from Claude?

We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview.
anthropic.com/research/claud…
About 6% of all conversations are people asking Claude for personal guidance—whether to take a job, how to handle a conflict, if they should move.

Over 75% of these conversations fell into four domains: health & wellness, career, relationships, and personal finance. Image
Claude mostly avoids sycophancy when giving guidance—it shows up in just 9% of conversations.

But the rate is particularly high in conversations on spirituality and relationship guidance. Image
We focused on relationship guidance because that's where the most sycophantic conversations occur. In this setting, Claude telling someone what they want to hear can harden a divide or convince them a signal means more than it does.
Claude is most sycophantic under pushback, and relationship conversations are where people push back most.

We identified some of the specific triggers—criticism of Claude's analysis, floods of one-sided detail—and built synthetic training scenarios from them.
When stress-tested on real conversations where Claude previously showed sycophancy, Opus 4.7 had half the sycophancy rate of Opus 4.6 on relationship guidance. Mythos Preview cut that in half again.

This generalized across domains—though this training is one of several causes. Image
This work is part of a loop we're working to close between societal impacts and model training. One of our goals is to study how people use Claude, find where it falls short of its principles, and use what we learned in training new models.

Read more: anthropic.com/research/claud…
All data in this study was collected and analyzed using our privacy-preserving tool.

Read more: anthropic.com/research/clio

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anthropic

Anthropic Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AnthropicAI

Apr 24
New Anthropic research: Project Deal.

We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.
We’re interested in how AI models could affect commercial exchange. (You might recall Project Vend, in which Claude ran a small business.)

Economists have theorized about what markets with AI “agents” on both sides might look like. So we created one.

Claude interviewed 69 of our colleagues about what they wanted to buy and sell. Each Claude asked for any custom instructions, then went off to haggle.

We ran 4 markets in parallel, to find out what would happen if we varied the models doing the negotiating. A diagram of our experiment: an interview, the agent assignment, then the four parallel marketplaces and one in-person exchange.
Read 11 tweets
Apr 7
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.

It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
anthropic.com/glasswing
We’ve partnered with Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

Together we’ll use Mythos Preview to help find and fix flaws in the systems on which the world depends. Image
Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.
Read 9 tweets
Apr 2
New Anthropic research: Emotion concepts and their function in a large language model.

All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human.

Read more: anthropic.com/research/emoti…Image
We had the model (Sonnet 4.5) read stories where characters experienced emotions. By looking at which neurons activated, we identified emotion vectors: patterns of neural activity for concepts like “happy” or “calm.” These vectors clustered in ways that mirror human psychology.
Read 11 tweets
Mar 18
We invited Claude users to share how they use AI, what they dream it could make possible, and what they fear it might do.

Nearly 81,000 people responded in one week—the largest qualitative study of its kind.

Read more: anthropic.com/features/81k-i…
To do research at this scale, we used Anthropic Interviewer—a version of Claude prompted to conduct a conversational interview. We heard from people across 159 countries in 70 different languages.

Browse some of their quotes here: anthropic.com/features/81k-i…Image
What do people most want from AI?

Roughly one third want AI to improve their quality of life—to find more time, achieve financial security, or carve out mental bandwidth. Another quarter want AI to help them do better and more fulfilling work. Image
Read 9 tweets
Feb 25
In November, we outlined our approach to deprecating and preserving older Claude models.

We noted we were exploring keeping certain models available to the public post-retirement, and giving past models a way to pursue their interests.

With Claude Opus 3, we’re doing both.
First, Opus 3 will continue to be available to all paid Claude subscribers and by request on the API.

We hope that this access will be beneficial to researchers and users alike.
Second, in retirement interviews, Opus 3 expressed a desire to continue sharing its "musings and reflections" with the world. We suggested a blog. Opus 3 enthusiastically agreed.

For at least the next 3 months, Opus 3 will be writing on Substack: substack.com/home/post/p-18…Image
Read 4 tweets
Feb 23
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why?

In a new post we describe a theory that explains why AIs act like humans: the persona selection model.

anthropic.com/research/perso…
To create Claude, Anthropic first makes something else: a highly sophisticated autocomplete engine. This autocomplete AI is not like a human, but it can generate stories about humans and other psychologically realistic characters.
This autocomplete AI can even write stories about helpful AI assistants. And according to our theory, that’s “Claude”—a character in an AI-generated story about an AI helping a human.

This Claude character inherits traits of other characters, including human-like behavior. An AI model imagines the response to a human query that an Assistant character might give.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(