Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
4 subscribers
Jul 27 • 11 tweets • 4 min read
KV caching in LLMs, clearly explained (with visuals):
KV caching is a technique used to speed up LLM inference.
Before understanding the internal details, look at the inference speed difference in the video:
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in!
Jul 26 • 9 tweets • 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals):
Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.
Let's dive in to learn more!
Jul 24 • 15 tweets • 5 min read
Let's compare Qwen 3 Coder & Sonnet 4 for code generation:
Qwen-3 Coder is Alibaba’s most powerful open-source coding LLM.
Today, let's build a pipeline to compare it to Sonnet 4 using:
- @LiteLLM for orchestration.
- @deepeval to build the eval pipeline (open-source).
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.
Let's dive in!
Jul 21 • 13 tweets • 5 min read
4 stages of training LLMs from scratch, clearly explained (with visuals):
Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.
I have been training neural networks for 9 years now.
Here are 16 ways I actively use to optimize model training:
Before we dive in, the following visual covers what we are discussing today.
Let's understand them in detail below!
Jul 19 • 13 tweets • 4 min read
Andrew Ng's team once made a big mistake in a research paper.
And it happened due to randomly splitting the data.
Here's what happened:
It is common to generate train and validation sets using random splitting.
However, in many situations, it can be fatal for model building.
Let's learn below!
Jul 18 • 12 tweets • 5 min read
After MCP, A2A, & AG-UI, there's another Agent protocol.
It's fully open-source and launched by IBM Research.
Here's a complete breakdown (with code):
ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.
Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.
Let's dive into the code first!
Jul 17 • 10 tweets • 3 min read
How to compress ML models, clearly explained (with code):
Model performance is rarely the only factor to determine which model will be deployed.
Instead, we also consider several operational metrics depicted below.
Knowledge distillation (KD) is popularly used to compress ML models before deployment.
Let's learn about it below.
Jul 15 • 13 tweets • 4 min read
Let's build an MCP-powered financial analyst (100% local):
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @crewAIInc for multi-agent orchestration
- @Ollama to locally serve DeepSeek-R1 LLM
- @cursor_ai as the MCP host
Let's go! 🚀
Jul 11 • 14 tweets • 5 min read
How to sync GPUs in multi-GPU training, clearly explained (with visuals):
One major run-time bottleneck in multi-GPU training happens during GPU synchronization.
For instance, in multi-GPU training via data parallelism:
- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.
Check this 👇
Jul 10 • 7 tweets • 3 min read
Naive RAG vs. Agentic RAG, clearly explained (with visuals):
Naive RAG has many issues:
- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.
- It cannot reason through complex queries.
- The system can't modify its strategy based on the problem.
Jul 8 • 13 tweets • 4 min read
How LLMs work, clearly explained (with visuals):
Before diving into LLMs, we must understand conditional probability.
Let's consider a population of 14 individuals:
- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none
Here's how it looks 👇
Jul 3 • 10 tweets • 4 min read
uv in Python, clearly explained (with code):
uv is incredibly fast.
- Creating virtual envs. using uv is ~80x faster than python -m venv.
- Package installation is 4–12x faster without caching, and ~100x with caching
Today, let's understand how to use uv for Python package management.
Let's dive in!
Jun 29 • 8 tweets • 3 min read
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):
Agentic applications require both A2A and MCP.
- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.
Today, let's clearly understand what A2A is and how it can work with MCP.
Jun 26 • 13 tweets • 7 min read
10 GitHub repos that will set you up for a career in AI engineering (100% free):
1️⃣ ML for Beginners by Microsoft
A 12-week project-based curriculum that teaches classical ML using real-world datasets using Scikit-learn.
Includes quizzes, R/Python lessons, and hands-on projects. Some of the lessons are available as short-form videos.
Check this👇
Jun 25 • 9 tweets • 3 min read
How Agents test Agents, clearly explained (with code):
Today, we'll learn Agent Testing by building a pipeline to test Agents with other Agents using Scenario.
Our open-source tech stack:
- @crewAIInc for Agent orchestration.
- @LangWatchAI Scenario to build the eval pipeline.
- @pytestdotorg as the runner.
Let's begin!
Jun 24 • 10 tweets • 3 min read
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:
Before we begin, here's what we'll be doing.
To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.
Let's begin!
Jun 22 • 14 tweets • 5 min read
Let's build an MCP server (100% locally):
Before diving in, here's what we'll be doing today:
- Understand MCP with a simple analogy.
- Build a local MCP server and interact with it via @cursor_ai.
- Integrate @Stagehanddev MCP and interact with it via Claude Desktop (shown in the video).
Let's dive in!
Jun 19 • 12 tweets • 6 min read
AI Engineering Hub is about to cross 10k GitHub stars!
It’s 100% open-source and hosts 70+ free hands-on demos.
Here are 10 MCP, RAG, and AI Agents projects for AI engineers:
1️⃣ MCP-powered RAG over videos
Learn how to build a video RAG that ingests a video and lets you chat with it. It also fetches the exact video chunk where an event occurred.
Let's build an MCP-powered RAG over videos, step-by-step:
Below, we have an MCP-driven video RAG that ingests a video and lets you chat with it. It also fetches the exact video chunk where an event occurred.
Our tech stack:
- @ragieai for video ingestion and retrieval.
- @cursor_ai as the MCP host.
Let's build it!
Jun 14 • 10 tweets • 3 min read
Transformer vs. Mixture of Experts in LLMs, clearly explained (with visuals):
Mixture of Experts (MoE) is a popular architecture that uses different experts to improve Transformer models.
The visual below explains how they differ from Transformers.