Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

elvis

@omarsar0

Feb 27 • 7 tweets • 2 min read • Read on X

Scrolly

Say goodbye to Chain-of-Thought.

Say hello to Chain-of-Draft.

To address the issue of latency in reasoning LLMs, this work introduces Chain-of-Draft (CoD).

Read on for more:

What is it about?

CoD is a new prompting strategy that drastically cuts down verbose intermediate reasoning while preserving strong performance.

Minimalist intermediate drafts

Instead of long step-by-step CoT outputs, CoD asks the model to generate concise, dense-information tokens for each reasoning step.

This yields up to 80% fewer tokens per response yet maintains accuracy on math, commonsense, and other benchmarks.

Low latency, high accuracy

On GSM8k math problems, CoD achieved 91% accuracy with an 80% token reduction compared to CoT. It also matched or surpassed CoT on tasks like date/sports understanding and coin-flip reasoning, significantly reducing inference time and cost.

Flexible & interpretable

Despite fewer words, CoD keeps the essential logic visible, similar to how humans jot down key points instead of full explanations. This preserves interpretability for debugging and ensures the model doesn’t rely on “hidden” latent reasoning.

Thoughts:

By showing that less is more, CoD can serve real-time applications where cost and speed matter. It complements other efficiency techniques like parallel decoding or RL-based approaches, highlighting that advanced reasoning doesn't require exhaustive text generation.

arxiv.org/abs/2502.18600

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @omarsar0

elvis

@omarsar0

Feb 20

NEW: Sakana AI introduces The AI CUDA Engineer.

It's an end-to-end agentic system that can produce highly optimized CUDA kernels.

This is wild! They used AI to discover ways to make AI run faster!

Let's break it down:

The Backstory

Sakana AI's mission is to build more advanced and efficient AI using AI.

Their previous work includes The AI Scientist, LLMs that produce more efficient methods to train LLMs, and automation of new AI foundation models.

And now they just launched The AI CUDA Engineer.

Why is this research a big deal?

Writing efficient CUDA kernels is challenging for humans.

The AI CUDA Engineer is an end-to-end agent built with the capabilities to automatically produce and optimize CUDA kernels more effectively.

Read 14 tweets

elvis

@omarsar0

Feb 19

NEW: Google introduces AI co-scientist.

It's a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

2025 is truly the year of multi-agents!

Let's break it down:

What's the goal of this AI co-scientist?

It can serve as a "virtual scientific collaborator to help scientists generate novel hypotheses and research proposals, and to accelerate the clock speed of scientific and biomedical discoveries."

How is it built?

It uses a coalition of specialized agents inspired by the scientific method.

It can generate, evaluate, and refine hypotheses.

It also has self-improving capabilities.

Read 11 tweets

elvis

@omarsar0

Feb 18

BREAKING: xAI announces Grok 3

Here is everything you need to know:

Elon mentioned that Grok 3 is an order of magnitude more capable than Grok 2.

Total GPUs: 200K

The capacity was doubled in 92 days!

All of this compute was used to improve Grok -- which has lead to Grok 3.

Read 23 tweets

elvis

@omarsar0

Feb 15

Introducing... Agent Leaderboard!

Many devs ask me which LLMs work best for AI agents.

The new Agent Leaderboard (by @rungalileo) was built to provide insights and evaluate LLMs on real-world tool-calling tasks—crucial for building AI agents.

Let's go over the results:

1️⃣ Leader

After evaluating 17 leading LLMs across 14 diverse datasets, here are the key findings:

Google's 𝗚𝗲𝗺𝗶𝗻𝗶-𝟮.𝟬-𝗳𝗹𝗮𝘀𝗵 leads with a 0.94 score at a remarkably low cost.

2️⃣ Pricing

The top 3 models span a 10x price difference with only 4% performance gap. Many of you might be overpaying.

Read 8 tweets

elvis

@omarsar0

Jan 23

OpenAI Introduces Operator & Agents!

Here is everything you need to know:

Operator is a system that can use a web browser to accomplish tasks.

Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.

Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.

Read 16 tweets

elvis

@omarsar0

Jan 21

Goodbye web scrapers!

Say hello to /extract by @firecrawl_dev

Just write a prompt and get the web data you need!

It doesn’t get any simpler than this.

The /extract endpoint is simple to use. Provide a prompt and a schema and retrieve any data you need from a website.

I’ve added the /* to the URL to find and extract information across the entire website.

The endpoint can return up to thousands of data points at once.

https://x.com/firecrawl_dev/status/1881386864591978552

What companies are already using /extract for:

- Enrich CRM data
- Streamline KYB processes
- Monitor competitors
- Supercharge onboarding experiences
- Build targeted prospecting lists

Examples here:

https://x.com/firecrawl_dev/status/1881386864591978552

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

elvis

Try unrolling a thread yourself!

More from @omarsar0

elvis

elvis

elvis

elvis

elvis

elvis

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!