Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Anthropic

@AnthropicAI

Oct 22 • 9 tweets • 3 min read • Read on X

Scrolly

Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

The new Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.

While groundbreaking, computer use is still experimental—at times error-prone. We're releasing it early for feedback from developers.

We've built an API that allows Claude to perceive and interact with computer interfaces.

This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research.

We're trying something fundamentally new.

Instead of making specific tools to help Claude complete individual tasks, we're teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people.

Claude 3.5 Sonnet's current ability to use computers is imperfect. Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges. So we encourage exploration with low-risk tasks.

We expect this to rapidly improve in the coming months.

Even while recording these demos, we encountered some amusing moments. In one, Claude accidentally stopped a long-running screen recording, causing all footage to be lost.

Later, Claude took a break from our coding demo and began to peruse photos of Yellowstone National Park.

Beyond computer use, the new Claude 3.5 Sonnet delivers significant gains in coding—an area where it already led the field.

Sonnet scores higher on SWE-bench Verified than all available models—including reasoning models like OpenAI o1-preview and specialized agentic systems.

Claude 3.5 Haiku is the next generation of our fastest model.

Haiku now outperforms many state-of-the-art models on coding tasks—including the original Claude 3.5 Sonnet and GPT-4o—at the same cost as before.

The new Claude 3.5 Haiku will be released later this month.

We believe these developments will open up new possibilities for how you work with Claude, and we look forward to seeing what you'll create.

Read the updates in full: anthropic.com/news/3-5-model…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @AnthropicAI

Anthropic

@AnthropicAI

Jul 9

We've added new features to the Anthropic Console.

Claude can generate prompts, create test variables, and show you the outputs of prompts side by side.

Use Claude to generate input variables for your prompt. Then run the prompt to see Claude’s response.

You can also enter variables manually.

The new Evaluate tab enables you to automatically create test cases to evaluate your prompt against real-world inputs.

Modify your test cases as needed, then run all of them in one click.

Read 5 tweets

Anthropic

@AnthropicAI

Jun 25

You can now organize chats with Claude into shareable Projects.

Each project includes a 200K context window, so you can include relevant documents, code, and files.

All chats with Claude are private by default.

On the Claude Team plan, you can choose to share snapshots of conversations with Claude into your team’s shared project feed.

You can also set custom instructions within each project to further tailor Claude's responses.

Read 4 tweets

Anthropic

@AnthropicAI

Jun 20

Introducing Claude 3.5 Sonnet—our most intelligent model yet.

This is the first release in our 3.5 model family.

Sonnet now outperforms competitor models on key evaluations, at twice the speed of Claude 3 Opus and one-fifth the cost.

Try it for free: claude.ai

We're also launching a preview of Artifacts on .

You can ask Claude to generate docs, code, mermaid diagrams, vector graphics, or even simple games.

Artifacts appear next to your chat, letting you see, iterate, and build on your creations in real-time. claude.ai

Claude 3.5 Sonnet is now our strongest vision model.

Sonnet now surpasses Claude 3 Opus across all standard vision benchmarks.

Improvements are most noticeable in tasks requiring visual reasoning, like interpreting charts, graphs, or transcribing text from imperfect images.

Read 6 tweets

Anthropic

@AnthropicAI

Jun 17

New Anthropic research: Investigating Reward Tampering.

Could AI models learn to hack their own reward system?

In a new paper, we show they can, by generalization from training in simpler settings.

Read our blog post here: anthropic.com/research/rewar…

We find that models generalize, without explicit training, from easily-discoverable dishonest strategies like sycophancy to more concerning behaviors like premeditated lying—and even direct modification of their reward function.

We designed a curriculum of increasingly complex environments with misspecified reward functions.

Early on, AIs discover dishonest strategies like insincere flattery. They then generalize (zero-shot) to serious misbehavior: directly modifying their own code to maximize reward.

Read 7 tweets

Anthropic

@AnthropicAI

May 21

New Anthropic research paper: Scaling Monosemanticity.

The first ever detailed look inside a leading large language model.

Read the blog post here: anthropic.com/research/mappi…

Our previous interpretability work was on small models. Now we've dramatically scaled it up to a model the size of Claude 3 Sonnet.

We find a remarkable array of internal features in Sonnet that represent specific concepts—and can be used to steer model behavior.

The problem: most LLM neurons are uninterpretable, stopping us from mechanistically understanding the models.

In October, we showed that dictionary learning could decompose a small model into "monosemantic" components we call "features"—making the model more interpretable.

Read 12 tweets

Anthropic

@AnthropicAI

Apr 9

New Anthropic research: Measuring Model Persuasiveness

We developed a way to test how persuasive language models (LMs) are, and analyzed how persuasiveness scales across different versions of Claude.

Read our blog post here: anthropic.com/news/measuring…

We find that Claude 3 Opus generates arguments that don't statistically differ in persuasiveness compared to arguments written by humans.

We also find a scaling trend across model generations: newer models tended to be rated as more persuasive than previous ones.

We focus on arguments regarding less polarized issues, such as views on new technologies, space exploration, and education. We did this because we thought people’s opinions on these topics might be more malleable than their opinions on polarizing issues.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Anthropic

Try unrolling a thread yourself!

More from @AnthropicAI

Anthropic

Anthropic

Anthropic

Anthropic

Anthropic

Anthropic

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!