Simplifying LLMs, AI Agents, RAGs and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
24 subscribers
Jun 13 • 11 tweets • 3 min read
Model Context Protocol (MCP), clearly explained:
MCP is like a USB-C port for your AI applications.
Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.
Let's dive in! 🚀
Jun 11 • 11 tweets • 3 min read
Object-oriented programming in Python, clearly explained:
We break it down to 6 important concepts:
Self-attention in LLMs, clearly explained:
Before we start a quick primer on tokenization!
Raw text → Tokenization → Embedding → Model
Embedding is a meaningful representation of each token (roughly a word) using a bunch of numbers.
This embedding is what we provide as an input to our language models.
Check this👇
Jun 3 • 12 tweets • 4 min read
Let's build an MCP-powered Agentic RAG (100% local):
Below, we have an MCP-powered Agentic RAG that searches a vector database and falls back to web search if needed.
To build this, we'll use:
- @firecrawl_dev search endpoint for web search.
- @qdrant_engine as the vector DB.
- @cursor_ai as the MCP client.
Let's build it!
Jun 1 • 10 tweets • 4 min read
Function calling & MCP for LLMs, clearly explained (with visuals):
Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.
The visual below explains how Function Calling and MCP work under the hood.
Let's learn more!
May 30 • 12 tweets • 4 min read
Let's build an MCP server that connects to 200+ data sources (100% local):
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @MindsDB to power our unified MCP server
- @cursor_ai as the MCP host
- @Docker to self-host the server
Let's go! 🚀
May 29 • 11 tweets • 4 min read
KV caching in LLMs, clearly explained (with visuals):
KV caching is a technique used to speed up LLM inference.
Before understanding the internal details, look at the inference speed difference in the video:
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in!
May 27 • 14 tweets • 5 min read
Let's build an MCP-powered financial analyst (100% local):
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @crewAIInc for multi-agent orchestration
- @Ollama to locally serve DeepSeek-R1 LLM
- @cursor_ai as the MCP host
Let's go! 🚀
May 20 • 9 tweets • 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals):
Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency—from simple responders to fully autonomous agents.
Let's dive to learn more about them.
May 17 • 11 tweets • 5 min read
9 MCP, LLM, and AI Agent cheat sheets for AI engineers (with visuals):
1️⃣ Model context Protocol
MCP is like a USB-C port for your AI applications.
Just as USB-C standardizes device connections; MCP standardizes AI app connections to data sources and tools.
Let's build an MCP-powered synthetic data generator (100% local):
Today, we're building an MCP server that every data scientist will love to have.
Tech stack:
- @cursor_ai as the MCP host
- @datacebo's SDV to generate realistic tabular synthetic data
Let's go! 🚀
May 15 • 14 tweets • 5 min read
Let's build a multi-agent book writer, powered Qwen3 (100% local):
Today, we are building an Agentic workflow that writes a 20k word book from a 3-5 word book title.
Tech stack:
- @firecrawl_dev for web scraping.
- @crewAIInc for orchestration.
- @ollama to serve Qwen 3 locally.
- @LightningAI for development and hosting
Let's go! 🚀
May 9 • 7 tweets • 3 min read
Traditional RAG vs. Agentic RAG, clearly explained (with visuals):
Traditional RAG has many issues:
- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.
- It cannot reason through complex queries.
- The system can't modify its strategy based on the problem.
May 5 • 13 tweets • 4 min read
How LLMs work, clearly explained:
Before diving into LLMs, we must understand conditional probability.
Let's consider a population of 14 individuals:
- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none
Here's how it looks 👇
May 4 • 7 tweets • 3 min read
5 amazing Jupyter Notebook tricks not known to many:
1️⃣ Retrieve a cell's output in Jupyter
If you often forget to assign the results of a Jupyter cell to a variable, you can use the `Out` dictionary to retrieve the output.
Apr 30 • 9 tweets • 3 min read
Let's fine-tune DeepMind's latest Gemma 3 (100% locally):
Before we begin, here's what we'll be doing.
We'll fine-tune our private and locally running Gemma 3.
To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.
Let's begin!
Apr 26 • 16 tweets • 5 min read
Let's build an MCP-powered multi-agent deep researcher (100% local):
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @Linkup_platform for deep web research
- @crewAIInc for multi-agent orchestration
- @Ollama to locally server DeepSeek
- @cursor_ai as MCP host
Let's go! 🚀
Apr 21 • 10 tweets • 4 min read
Transformer vs. Mixture of Experts in LLMs, clearly explained (with visuals):
Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.
The visual below explains how they differ from Transformers.
Let's dive in to learn more about MoE!
Apr 17 • 12 tweets • 5 min read
10 MCP, AI Agents, and RAG projects for AI Engineers (with code):
1️⃣ Real-time Voice RAG Agent
In this project you'll learn how to build a real-time Voice RAG Agent.
You will also learn how to clone your voice in just 5 seconds.