elvis Profile picture
Building with AI agents @dair_ai • Prev: Meta AI, Galactica LLM, Elastic, PaperswithCode, PhD • I also teach how to leverage and build with LLMs & AI Agents ⬇️
23 subscribers
May 3 6 tweets 3 min read
A Survey of AI Agent Protocols

5 things that stood out to me about this report: Image Agent Internet Ecosystem

Here is what the layered architecture of the agent internet ecosystem looks like as it stands. It shows different layers, like the Agent Internet, the Protocol Layer, and the Application Layer. Image
May 1 8 tweets 3 min read
Small reasoning models are here!

Microsoft just released Phi-4-Mini-Reasoning to explore small reasoning language models for math.

Let's find out how this all works: Image Phi-4-Mini-Reasoning

The paper introduces Phi-4-Mini-Reasoning, a 3.8B parameter small language model (SLM) that achieves state-of-the-art mathematical reasoning performance, rivaling or outperforming models nearly TWICE its size. Image
Apr 30 7 tweets 3 min read
Universal RAG

RAG is dead, they said.

Then you see papers like this and it gives you a better understanding of the opportunities and challenges ahead.

Lots of great ideas in this paper. I've summarized a few below: Image What is it?

UniversalRAG is a framework that overcomes the limitations of existing RAG systems confined to single modalities or corpora. It supports retrieval across modalities (text, image, video) and at multiple granularities (e.g., paragraph vs. document, clip vs. video).
Apr 29 9 tweets 3 min read
Building Production-Ready AI Agents with Scalable Long-Term Memory

Memory is one of the most challenging bits of building production-ready agentic systems.

Lots of goodies in this paper.

Here is my breakdown: Image What does it solve?

It proposes a memory-centric architecture for LLM agents to maintain coherence across long conversations and sessions, solving the fixed-context window limitation. Image
Apr 29 5 tweets 2 min read
A Survey of Efficient LLM Inference Serving

This one provides a comprehensive taxonomy of recent system-level innovations for efficient LLM inference serving.

Great overview for devs working on inference.

Here is what's included: Image Instance-Level Methods

Techniques like model parallelism (pipeline, tensor, context, and expert parallelism), offloading (e.g., ZeRO-Offload, FlexGen, TwinPilots), and request scheduling (inter- and intra-request) are reviewed... Image
Apr 27 8 tweets 3 min read
265 pages of everything you need to know about building AI agents.

5 things that stood out to me about this report: Image 1. Human Brain and LLM Agents

Great to better understand what differentiates LLM agents from human/brain cognition, and what inspirations we can get from the way humans learn and operate. Image
Apr 16 21 tweets 7 min read
BREAKING: OpenAI introduces new o-series models

o3 and o4-mini

OpenAI claims that these are models that can produce novel and useful ideas.

Here is all you need to know: Image They are rolling out starting today on ChatGPT and APIs.

These reasoning models have gotten better at using internal tooling to solve very complex tasks.

And they are getting way better at it.
Apr 9 10 tweets 2 min read
NEW: Google announces Agent2Agent

Agent2Agent (A2A) is a new open protocol that lets AI agents securely collaborate across ecosystems regardless of framework or vendor.

Here is all you need to know: Universal agent interoperability

A2A allows agents to communicate, discover each other’s capabilities, negotiate tasks, and collaborate even if built on different platforms. This enables complex enterprise workflows to be handled by a team of specialized agents.
Apr 5 16 tweets 7 min read
Llama 4 is here!

- Llama 4 Scout & Maverick are up for download
- Llama 4 Behemoth (preview)
- Advanced problem solving & multilingual
- Support long context up to 10M tokens
- Great for multimodal apps & agents
- Image grounding
- Top performance at the lowest cost
- Can be served within $0.19-$0.49/M tokensImage LMArena ELO score vs. cost

"To deliver a user experience with a decode latency of 30ms for each token after a one-time 350ms prefill latency, we estimate that the model can be served within a range of $0.19-$0.49 per million tokens (3:1 blend)" Image
Mar 13 5 tweets 2 min read
Prompt Engineering is NOT dead!

If you develop seriously with LLMs and are building complex agentic flows, you don't need convincing about this.

I've built the most comprehensive, up-to-date course on prompting LLMs, including reasoning LLMs.

4 hours of content! All Python! Image Check it out if you're building AI Agents or RAG systems -- prompting tips, emerging use cases, advanced prompting techniques, enhancing LLM reliability, and much more.

All code examples use pure Python and the OpenAI SDKs. That's it!
Mar 11 16 tweets 6 min read
NEW: OpenAI announces new tools for building agents.

Here is everything you need to know: Image OpenAI has already launched two big agent solutions like Deep Research and Operator.

The tools are now coming to the APIs for developers to build their own agents. Image
Mar 5 8 tweets 3 min read
A Few Tokens Are All You Need

Can you cut the fine-tuning costs of an LLM by 75% and keep strong reasoning performance?

A new paper from the Tencent AI Lab claims that it might just be possible.

Let's find out how: Image The First Few Tokens

It shows that all you need is a tiny prefix to improve your model’s reasoning—no labels or massive datasets are required!

Uses an unsupervised prefix fine-tuning method (UPFT)—only requiring prefix substrings (as few as 8 tokens) of generated solutions. Image
Feb 27 7 tweets 2 min read
Say goodbye to Chain-of-Thought.

Say hello to Chain-of-Draft.

To address the issue of latency in reasoning LLMs, this work introduces Chain-of-Draft (CoD).

Read on for more: Image What is it about?

CoD is a new prompting strategy that drastically cuts down verbose intermediate reasoning while preserving strong performance. Image
Feb 20 14 tweets 5 min read
NEW: Sakana AI introduces The AI CUDA Engineer.

It's an end-to-end agentic system that can produce highly optimized CUDA kernels.

This is wild! They used AI to discover ways to make AI run faster!

Let's break it down: Image The Backstory

Sakana AI's mission is to build more advanced and efficient AI using AI.

Their previous work includes The AI Scientist, LLMs that produce more efficient methods to train LLMs, and automation of new AI foundation models.

And now they just launched The AI CUDA Engineer.Image
Feb 19 11 tweets 4 min read
NEW: Google introduces AI co-scientist.

It's a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

2025 is truly the year of multi-agents!

Let's break it down: Image What's the goal of this AI co-scientist?

It can serve as a "virtual scientific collaborator to help scientists generate novel hypotheses and research proposals, and to accelerate the clock speed of scientific and biomedical discoveries." Image
Feb 18 23 tweets 7 min read
BREAKING: xAI announces Grok 3

Here is everything you need to know: Image Elon mentioned that Grok 3 is an order of magnitude more capable than Grok 2. Image
Feb 15 8 tweets 2 min read
Introducing... Agent Leaderboard!

Many devs ask me which LLMs work best for AI agents.

The new Agent Leaderboard (by @rungalileo) was built to provide insights and evaluate LLMs on real-world tool-calling tasks—crucial for building AI agents.

Let's go over the results: Image 1️⃣ Leader

After evaluating 17 leading LLMs across 14 diverse datasets, here are the key findings:

Google's 𝗚𝗲𝗺𝗶𝗻𝗶-𝟮.𝟬-𝗳𝗹𝗮𝘀𝗵 leads with a 0.94 score at a remarkably low cost.
Jan 23 16 tweets 4 min read
OpenAI Introduces Operator & Agents!

Here is everything you need to know: Image Operator is a system that can use a web browser to accomplish tasks.

Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.
Jan 21 4 tweets 2 min read
Goodbye web scrapers!

Say hello to /extract by @firecrawl_dev

Just write a prompt and get the web data you need!

It doesn’t get any simpler than this. The /extract endpoint is simple to use. Provide a prompt and a schema and retrieve any data you need from a website.

I’ve added the /* to the URL to find and extract information across the entire website.

The endpoint can return up to thousands of data points at once.
Jan 20 4 tweets 2 min read
The DeepSeek-R1 paper is a gem!

Highly encourage everyone to read it.

It's clear that LLM reasoning capabilities can be learned in different ways.

RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties.

There is more to RL than meets the eye!

Here is my breakdown of the paper along with a few tests: youtu.be/3GlFd3doO3U?si…

The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into.

Data quality is still very important for enhancing the usability of the LLM.

Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it. This opens up exciting research opportunities.

About the attached clip: the previous preview model wasn't able to solve this task. DeepSeek-R1 can solve this and many other tasks that o1 can solve. It's a very good model for coding and math. When DeepSeek said "on par with OpenAI-o1" I thought they were just hyping. But based on my tests, it's clearly not so.

Wanted to add that DeepSeek-R1 got all of the hard tasks from the OpenAI LLM reasoning blog post correct for me. This is wild and totally unexpected! The only task where it failed (i.e., crossword puzzle) o1 also fails.Image
Jan 8 14 tweets 4 min read
Agents Overview

Great write-up on Agents by Chip.

Here are my takeaways: Image 🤖 Agents Overview

An AI agent is made up of both the environment it operates in (e.g., a game, the internet, or computer system) and the set of actions it can perform through its available tools. This dual definition is fundamental to understanding how agents work.