Shruti Profile picture
Aug 2 15 tweets 5 min read Read on X
AI Industry Made $57 Billion Mistake and No One’s Talking About It.

While GPT-5 headlines kept you distracted...

NVIDIA quietly released a bold claim:
→ Small Language Models (SLMs) are the future of AI agents

Cheaper, faster and just as capable for 80% of real-world tasks.

Easily one of the biggest shifts in AI this year and most people missed it.

99% people haven’t read this but they should: 🧵Image
1/The paper is titled:

“Small Language Models are the Future of Agentic AI”

Published by NVIDIA Research.

It challenges the core assumption behind every LLM deployment today:

"That you need GPT-4–level models to run useful agents."

As per the research.. truth is - "You don't.."

now, let's dive deeper:Image
2/ LLMs like GPT-4 and Claude 3.5 are powerful but most AI agents don’t need that power.

They handle narrow, repetitive tasks.

And SLMs can do those better, cheaper, and faster. Image
3/ What’s an SLM?

A Small Language Model is tiny enough to run on your laptop or edge device.

We're talking under 10B parameters...fast, fine-tunable, and private.

Think:
→ Lower latency
→ Offline control
→ Fraction of the cost Image
4/ What did NVIDIA actually say?

In their new paper, NVIDIA Research argues:

“SLMs are the future of agentic AI.”

→ Most AI agents just do narrow, repetitive tasks
→ They don’t need 70B+ parameters
→ SLMs (small language models) are cheaper, faster, and just as accurate in real workflows

Let that sink in!
5/ The math here is wild:

Newer SLMs like Phi-3 and Hymba match 30–70B models in tool use, reasoning, and instruction following:

→ Run 10× faster, use 10–30× less compute in real workflows

Tool use, commonsense, and instruction-following? On point. Image
6/ Serving GPT-4 is expensive.

Running a tuned SLM is 10–30x cheaper.

And you can deploy them:

→ On-device
→ With custom formats
→ Using tools like ChatRTX, LoRA, QLoRA

Enterprise-ready, budget-friendly.
7/ NVIDIA tested this across 3 real-world AI agents:

- MetaGPT → 60% of tasks replaceable by SLMs
- Open Operator → 40%
- Cradle (GUI automation) → 70%

And those are today’s SLMs.

This paper could reshape how we build AI agents in the next decade. Image
8/ Why This Matters for AGI:

The path to human-like agents isn’t bigger models.
It’s modular ones.

SLMs can be specialists, like tools in a toolbox.

And that’s exactly how human reasoning works. Image
9/ The Moral Edge

SLMs aren’t just efficient, they’re ethical.

They:
→ Reduce energy usage
→ Enable edge privacy
→ Empower smaller teams & communities

LLMs centralize power. SLMs distribute it.
10/ So why is nobody using them?

NVIDIA lists 3 reasons:

→ $57B was invested into centralized LLM infra in 2024. But SLMs might now challenge that model, on performance, cost, and flexibility.
→ Benchmarking is still biased toward “bigger is better”
→ SLMs get zero hype compared to GPT-4, Claude, etc.

This paper flips that.

People just don’t know what SLMs can do (yet)

But that’s changing fast.
11/ NVIDIA even outlined a step-by-step migration framework to convert LLM agents to SLM-first systems:

How to migrate LLM-based agents into SLM-first systems
How to fine-tune SLMs for specific tasks
How to cluster tasks and build SLM “skills”
How to scale all locally, if needed

They’re not guessing.
They built the roadmap.
12/ So what does this mean?

→ Most AI agents today are overbuilt
→ You might be paying 20x more for marginal gains
→ You’re also locked into centralized APIs

SLMs break that model.
And NVIDIA just made the case for switching at scale.
13/ This isn’t anti-GPT.

It’s post-GPT.

LLMs gave us the spark.
SLMs will give us the system.

The next 100 million agents won’t run on GPT-4.
They’ll run on tiny, specialized, ultra-cheap models.

And that’s not a theory. It’s already happening. Image
14/ TL;DR

The AGI race won’t be won with trillion-token giants.

The path to scalable agentic systems isn’t just bigger models. It’s modular, fine-tuned, and specialized powered by SLMs

📄 Read the paper: arxiv.org/abs/2506.02153

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shruti

Shruti Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @heyshrutimishra

Aug 1
This paper didn’t go viral but it should have.

A tiny AI model called HRM just beat Claude 3.5 and Gemini.

It doesn’t even use tokens.

They said it was just a research preview.

But it might be the first real shot at AGI.

Here’s what really happened and why OpenAI should be worried: 🧵Image
2/ Sapient Intelligence is a Singapore-based AI research startup focused on creating brain-inspired reasoning systems. They recently dropped HRM - a brain-inspired AI model that doesn’t think in tokens.

HRM (Hierarchical Reasoning Model) uses multi-timescale recurrence - a structure inspired by how humans reason, not how language models complete sentences.

One loop handles fast decisions. Another refines ideas over time. Together, they think, not just complete.Image
3/ Most AI models today "think" by writing one word at a time.

That’s called chain-of-thought.
It looks smart but if it makes a mistake early, everything after falls apart.

It’s fragile. It’s slow. And it’s not real thinking.

HRM works differently.
It doesn’t think out loud. It thinks silently like your brain.

Instead of writing words, it keeps ideas inside and improves them over time.

This is called chain-in-representation. A whole new way of reasoning.Image
Read 11 tweets
Jul 31
Anthropic Academy just released free online courses.

No payment required.

Here are 6 courses you don't want to miss in 2025: Image
1/ Claude with Anthropic API

What you'll learn:

- Handle API requests and responses for Claude models
- Implement dialogues, streaming, and structured outputs
- Systematically build and test prompts
- Create tools and integrate Claude with external services
- Design RAG systems using hybrid search and reranking
- Use MCP to link Claude to data sources
- Understand workflows and agent architectures

Visit: anthropic.skilljar.com/claude-with-th…
2/ Claude with Amazon Bedrock:

What you'll learn:

- access Claude models on AWS Bedrock using boto3
- enable multi-turn chats, streaming, and data extraction
- develop/test prompts with automated scoring
- create tools and manage workflows
- design RAG systems with chunking and hybrid search
- connect Claude to services via MCP servers
- use Claude Code for automation and parallel tasks
- optimize prompt caching and image processing
- implement automated testing and UI interactions

Visit: anthropic.skilljar.com/claude-in-amaz…
Read 10 tweets
Jul 17
🚨 BREAKING: OpenAI just launched ChatGPT Agent

Now, ChatGPT will browse the internet for you while solving tasks.

Here’s what just dropped and it’s wild: 🧵👇
1/ Turn on the agent mode and give it complex task

The agent will open a browser in it's own virtual computer to solved the task
2/ Agent mode = Deep Research + Operator

It has access to tools and connectors like Gmail, Drive, Notion

The models understand which all tools to use to solve the task
Read 6 tweets
Jun 20
I just created a full presentation from NVIDIA’s financial reports using AI.

No templates. No manual slides... just real data turned into structured, editable slides in minutes.

Here’s how I did it with Skywork 🧵👇
Step 1: Collecting the data: I downloaded NVIDIA’s latest quarterly reports, investor decks, and keynotes.

There was too much text and too many numbers, but all of it was valuable. I transformed them into a private knowledge base.
Step 2: Creating the knowledge base in Skywork

Skywork lets you upload reports, PDFs, articles.. anything.

It analyzes all of it, builds a structured understanding, and keeps sources traceable.

This becomes your personal knowledge foundation.
Read 6 tweets
Jun 14
This is wild… You can now build a full business solo with AI.

From idea → product → launch → marketing.... no team required.

Here’s your 2025 AI Stack to go solo like a pro 🧵👇 Image
1. Coding & Dev Tools

→ Cursor – AI-native code editor, built for vibe coding
→ CodeRabbit – Real-time code reviews & bug fixes
→ Warp – AI terminal that autocompletes & explains commands Image
2. Brainstorming & Research

→ Gemini, , ChatGPT – idea validation + planning
→ Perplexity – deep, fast market research Image
Read 8 tweets
Jun 12
🚨 BREAKING: NVIDIA just revealed its roadmap for physical AI, robotics and national-scale AI factories.

Most people don’t know it: In 2016, they built their first AI supercomputer with zero customers. OpenAI was the first to say yes.

Here’s a breakdown of the most important announcements from #GTCParis :🧵👇
1/ 🚨 130 TB/s bandwidth

The Grace Blackwell NVL72 (One Giant GPU) system connects 72 Blackwell GPUs, moving more data per second than the entire global internet.
2/ Digital twins

Everything physical will be built digitally first.” – That’s what Jensen said on stage at GTC Paris

From cars to factories... tested, optimized, perfected in simulation

AI + Digital Twins are becoming the new foundation of how we build
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(