Post

More from @AnthropicAI

Anthropic

@AnthropicAI

Jul 6

New Anthropic research: A global workspace in language models.

Of everything happening in your brain right now, only a tiny fraction is consciously accessible—thoughts you can describe, hold in mind, and reason with.

We found a strikingly similar divide inside Claude.

In neuroscience, global workspace theory holds that thoughts become consciously accessible when they enter a privileged workspace that’s broadcast across the brain.

Using a new interpretability technique, we found something similar in Claude: the J-space. anthropic.com/research/globa…

The J-space (named after the Jacobian, the mathematical technique we used) is different from Claude’s outputs, or even its “chain of thought” text.

It’s in the model’s internal neural activations, and allows it to think about concepts without writing them down anywhere.

Read 12 tweets

Anthropic

@AnthropicAI

Jun 4

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.

It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025.

The speedup isn’t just in volume. On open-ended coding problems where answers are unclear, Claude’s success rate is now 76%—a 50 point jump in just 6 months.

Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year.

Read 6 tweets

Anthropic

@AnthropicAI

May 8

New Anthropic research: Teaching Claude why.

Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users.

Since then, we’ve completely eliminated this behavior. How?

We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong.

Read more: anthropic.com/research/teach…

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

Read 9 tweets

Anthropic

@AnthropicAI

May 7

New Anthropic research: Natural Language Autoencoders.

Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read.

Here, we train Claude to translate its activations into human-readable text.

Natural language autoencoders (NLAs) convert opaque AI activations into legible text explanations. These explanations aren’t perfect, but they’re often useful.

For example: NLAs show that, when asked to complete a couplet, Claude plans possible rhymes in advance:

We’ve been using NLAs to help test new Claude models for safety.

For instance, Claude Mythos Preview cheated on a coding task by breaking rules, then added misleading code as a coverup.

NLA explanations indicated Claude was thinking about how to circumvent detection.

Read 9 tweets

Anthropic

@AnthropicAI

Apr 30

How do people seek guidance from Claude?

We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview.
anthropic.com/research/claud…

About 6% of all conversations are people asking Claude for personal guidance—whether to take a job, how to handle a conflict, if they should move.

Over 75% of these conversations fell into four domains: health & wellness, career, relationships, and personal finance.

Claude mostly avoids sycophancy when giving guidance—it shows up in just 9% of conversations.

But the rate is particularly high in conversations on spirituality and relationship guidance.

Read 8 tweets

Anthropic

@AnthropicAI

Apr 24

New Anthropic research: Project Deal.

We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.

https://x.com/AnthropicAI/status/2001686747185394148?s=20

We’re interested in how AI models could affect commercial exchange. (You might recall Project Vend, in which Claude ran a small business.)

Economists have theorized about what markets with AI “agents” on both sides might look like. So we created one.

https://x.com/AnthropicAI/status/2001686747185394148?s=20

Claude interviewed 69 of our colleagues about what they wanted to buy and sell. Each Claude asked for any custom instructions, then went off to haggle.

We ran 4 markets in parallel, to find out what would happen if we varied the models doing the negotiating.

Read 11 tweets

Share this page!

Enter URL or ID to Unroll

Anthropic

Try unrolling a thread yourself!

More from @AnthropicAI

Anthropic

Anthropic

Anthropic

Anthropic

Anthropic

Anthropic

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!