5 levels of Agentic AI systems, clearly explained (with visuals):
Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.
Let's dive in to learn more!
1️⃣ Basic responder
- A human guides the entire flow.
- The LLM is just a generic responder that receives an input and produces an output. It has little control over the program flow.
See this visual👇
2️⃣ Router pattern
- A human defines the paths/functions that exist in the flow.
- The LLM makes basic decisions on which function or path it can take.
Check this visual👇
3️⃣ Tool calling
- A human defines a set of tools the LLM can access to complete a task.
- LLM decides when to use them and also the arguments for execution.
Check this visual👇
4️⃣ Multi-agent pattern
A manager agent coordinates multiple sub-agents and decides the next steps iteratively.
- A human lays out the hierarchy between agents, their roles, tools, etc.
- The LLM controls execution flow, deciding what to do next.
See this visual👇
5️⃣ Autonomous pattern
The most advanced pattern, wherein, the LLM generates and executes new code independently, effectively acting as an independent AI developer.
Here's a visual to understand this👇
To recall:
1) Basic responder only generate text. 2) Router pattern decides when to take a path. 3) Tool calling picks & runs tools. 4) Multi-Agent pattern manages several agents. 5) Autonomous pattern works fully independently.
Here's the visual again for your reference👇
That's a wrap!
If you found it insightful, reshare it with your network.
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML in recommendation
- Spotify uses graph ML in recommendation
- Pinterest uses graph ML in recommendation
Here are 6 must-know ways for graph feature engineering (with code):
Like images, text, and tabular datasets have features, so do graph datasets.
This means when building models on graph datasets, we can engineer these features to achieve better performance.
Let's discuss some feature engineering techniques below!
First, let’s create a dummy social networking graph dataset with accounts and followers (which will also be accounts).
We create the two DataFrames shown below, an accounts DataFrame and a followers DataFrame.
"Our GPT model generates 100 tokens in 42 seconds.
How do you make it 5x faster?"
You: "I'll allocate more GPUs for faster generation."
Interview over.
Here's what you missed:
The real bottleneck isn't compute, it's redundant computation.
Without KV caching, your model recalculates keys and values for each token, repeating work.
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in to understand how it works!
To understand KV caching, we must know how LLMs output tokens.
- Transformer produces hidden states for all tokens.
- Hidden states are projected to the vocab space.
- Logits of the last token are used to generate the next token.
- Repeat for subsequent tokens.
You're in a Research Scientist interview at OpenAI.
The interviewer asks:
"How would you expand the context length of an LLM from 2K to 128K tokens?"
You: "I will fine-tune the model on longer docs with 128K context."
Interview over.
Here's what you missed:
Extending the context window isn't just about larger matrices.
In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!
So, how do we manage it?
continue...👇
1) Sparse Attention
It limits the attention computation to a subset of tokens by:
- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.
But this has a trade-off between computational complexity and performance.