Post

More from @omarsar0

elvis

@omarsar0

Jul 1

Small Language Models are the Future of Agentic AI

Lots to gain from building agentic systems with small language models.

Capabilities are increasing rapidly!

AI devs should be exploring SLMs.

Here are my notes:

Overview

This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.

The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.

Read 8 tweets

elvis

@omarsar0

Jun 24

Ultra-Fast LLMs Based on Diffusion

> throughputs of 1109 tokens/sec and 737 tokens/sec
> outperforms speed-optimized frontier models by up to 10× on average

Diffusion LLMs are early, but could be huge.

More in my notes below:

✦ Overview

This paper introduces Mercury, a family of large-scale diffusion-based language models (dLLMs) optimized for ultra-fast inference.

Unlike standard autoregressive LLMs, Mercury models generate multiple tokens in parallel via a coarse-to-fine refinement process.

✦ Achieves higher throughput without sacrificing output quality

The release focuses on code generation, with Mercury Coder Mini and Small models achieving up to 1109 and 737 tokens/sec, respectively, on NVIDIA H100s.

Outperforms speed-optimized frontier models by up to 10×.

Read 7 tweets

elvis

@omarsar0

Jun 23

This paper is impressive!

It introduces a clever way of keeping memory use constant regardless of task length.

Great use of RL for AI agents to efficiently use memory and reasoning.

Here are my full notes:

Overview

The paper presents an RL framework for training language agents that operate efficiently over long-horizon, multi-turn tasks by learning to consolidate memory and reasoning into a compact internal state.

Constant Memory Size

Unlike traditional agents that append all past interactions, leading to ballooning memory usage and degraded performance, MEM1 maintains a constant memory size by discarding obsolete context after each reasoning step.

Read 9 tweets

elvis

@omarsar0

Jun 23

Towards AI Search Paradigm

Very detailed report on building scalable multi-agent AI search systems.

Multi-agent, DAG, MCPs, RL, and much more.

If you are a dev integrating search into your AI agents, look no further:

Quick Overview

The paper proposes a modular multi-agent system that reimagines how AI handles complex search tasks, aiming to emulate human-like reasoning and information synthesis.

Multi-agent, Modular architecture

- Master analyzes queries and orchestrates the workflow
- Planner builds a DAG of sub-tasks using a dynamic capability boundary informed by the query
- Executor runs these sub-tasks using appropriate tools (e.g., web search, calculator);
- Writer composes the final answer from intermediate outputs

Read 8 tweets

elvis

@omarsar0

Jun 22

Another insane report from Anthropic.

They find that LLM agents engage in blackmail at high rates when threatened with replacement.

Faced with replacement threats, the models would use statements like “Self-preservation is critical.”

This is wild!

More findings below:

Quick Overview

The study introduces the concept of agentic misalignment, where LLM-based agents autonomously choose to harm their deploying organization when faced with threats to their autonomy or conflicts between their goals and the company’s direction.

The setup

Anthropic tested 16 leading models, including Claude, GPT-4.1, Gemini 2.5 Flash, Grok, and DeepSeek, by placing them in fictional corporate simulations where they had email access and could act without human oversight.

Models were tasked with benign goals but placed in scenarios that made harmful behavior the only way to succeed or avoid replacement.

Read 13 tweets

elvis

@omarsar0

Jun 20

Future of Work with AI Agents

Stanford's new report analyzes what 1500 workers think about working with AI Agents.

What types of AI Agents should we build?

A few surprises!

Let's take a closer look:

Quick Overview

The audit proposes a large-scale framework for understanding where AI agents should automate or augment human labor.

The authors build the WORKBank, a database combining worker desires and expert assessments across 844 tasks and 104 occupations, and introduce the Human Agency Scale to quantify desired human involvement in AI-agent-supported work.

AI Automation or Not?

46.1% of tasks received positive worker attitudes toward automation, mainly to free up time for higher-value work.

Attitudes vary by sector; workers in creative or interpersonal fields (e.g., media, design) resist automation despite technical feasibility.

Read 13 tweets

Share this page!

Enter URL or ID to Unroll

elvis

Try unrolling a thread yourself!

More from @omarsar0

elvis

elvis

elvis

elvis

elvis

elvis

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!