To address the issue of latency in reasoning LLMs, this work introduces Chain-of-Draft (CoD).
Read on for more:
What is it about?
CoD is a new prompting strategy that drastically cuts down verbose intermediate reasoning while preserving strong performance.
Minimalist intermediate drafts
Instead of long step-by-step CoT outputs, CoD asks the model to generate concise, dense-information tokens for each reasoning step.
This yields up to 80% fewer tokens per response yet maintains accuracy on math, commonsense, and other benchmarks.
Low latency, high accuracy
On GSM8k math problems, CoD achieved 91% accuracy with an 80% token reduction compared to CoT. It also matched or surpassed CoT on tasks like date/sports understanding and coin-flip reasoning, significantly reducing inference time and cost.
Flexible & interpretable
Despite fewer words, CoD keeps the essential logic visible, similar to how humans jot down key points instead of full explanations. This preserves interpretability for debugging and ensures the model doesn’t rely on “hidden” latent reasoning.
Thoughts:
By showing that less is more, CoD can serve real-time applications where cost and speed matter. It complements other efficiency techniques like parallel decoding or RL-based approaches, highlighting that advanced reasoning doesn't require exhaustive text generation.
It's a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.
2025 is truly the year of multi-agents!
Let's break it down:
What's the goal of this AI co-scientist?
It can serve as a "virtual scientific collaborator to help scientists generate novel hypotheses and research proposals, and to accelerate the clock speed of scientific and biomedical discoveries."
How is it built?
It uses a coalition of specialized agents inspired by the scientific method.
Many devs ask me which LLMs work best for AI agents.
The new Agent Leaderboard (by @rungalileo) was built to provide insights and evaluate LLMs on real-world tool-calling tasks—crucial for building AI agents.
Let's go over the results:
1️⃣ Leader
After evaluating 17 leading LLMs across 14 diverse datasets, here are the key findings:
Google's 𝗚𝗲𝗺𝗶𝗻𝗶-𝟮.𝟬-𝗳𝗹𝗮𝘀𝗵 leads with a 0.94 score at a remarkably low cost.
2️⃣ Pricing
The top 3 models span a 10x price difference with only 4% performance gap. Many of you might be overpaying.