Small Language Models are the Future of Agentic AI
Lots to gain from building agentic systems with small language models.
Capabilities are increasing rapidly!
AI devs should be exploring SLMs.
Here are my notes:
Overview
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
SLMs are already capable of commonsense reasoning, instruction following, and code/tool interaction at levels comparable to 30–70B models, with orders of magnitude better throughput.
Examples include Phi-3, Hymba-1.5B, DeepSeek-R1-Distill, and RETRO-7.5B.
The economic benefits are significant: SLMs offer 10–30× lower inference cost than LLMs, require less parallel infrastructure, and are amenable to overnight fine-tuning and even edge deployment (e.g., ChatRTX).
This enables faster iteration and better data control.
SLMs support modular, composable agent systems where specialized models handle subtasks, resulting in better alignment, lower risk of hallucinations, and easier debugging.
The authors advocate for heterogeneous architectures, with SLMs as defaults and LLMs used selectively.
A six-step LLM-to-SLM conversion algorithm is proposed, involving usage logging, task clustering, and PEFT fine-tuning.
This supports gradual migration from monolithic agents to SLM-based compositions.
Case studies on MetaGPT, Open Operator, and Cradle suggest 40–70% of LLM invocations can be reliably replaced with SLMs, particularly for structured generation and routine tool use.
LLMs retain an advantage in general language understanding, and that economic inertia favors their continued use, but this paper makes a compelling case that SLM-centric systems better reflect real-world agentic requirements and resource constraints.
Just built a new skill in Claude Code using Opus 4.5.
The skill uses Gemini 3 Pro (via API) for designing web pages.
Look at what it generated from one simple prompt.
If you have been designing websites with Claude Code, you already know how generic they turn out.
So I built a skill that uses Gemini 3 Pro to lead creative direction and generate designs. It is extremely good at this.
Opus 4.5 then integrates all that into our app.
The prompt I used: "I want to design the landing page for a new AI game. We want it to be futuristic and all that, and use animations as much as possible."
I will test with some other prompts and see how far I can push this. But the results are very exciting already.
This is one of the most insane things Nano Banana Pro 🍌 can do.
It can reproduce figures with mind-blowing precision.
No competition in this regard!
Prompt: "Please reproduce this chart in high quality and fidelity and offer annotated labels to better understand it."
When I tried this for the first time, I didn't expect that this was possible.
The level of understanding this requires is what's remarkable about it all.
The levels of personalization this unlocks are also impressive.
"Can you convert it into a cartoonish version?"
Just look at this 🤯
"Can you create a delightful cartoonish version of this table. And please put cute colors and icons along with interesting annotations to make it more readable."