Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
2 subscribers
Mar 25 • 14 tweets • 5 min read
Let's build an MCP server (100% locally):
Before diving in, here's what we'll be doing:
- Understand MCP with a simple analogy.
- Build a local MCP server and interact with it via @cursor_ai IDE.
- Integrate @firecrawl_dev's MCP server and interact with its tools (shown in the video).
Let's dive in 🚀!
Mar 21 • 9 tweets • 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals):
Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency—from simple responders to fully autonomous agents.
Let's dive to learn more about them.
Mar 15 • 9 tweets • 3 min read
Let's fine-tune DeepMind's latest Gemma 3 (100% locally):
Before we begin, here's what we'll be doing.
We'll fine-tune our private and locally running Gemma 3.
To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.
Let's begin!
Mar 14 • 15 tweets • 5 min read
Let's build a multi-agent book writer, powered by DeepMind's Gemma 3 (100% local):
Today, we are building an Agentic workflow that writes a 20k word book from a 3-5 word book title.
Tech stack:
- Bright Data to scrape web at scale.
- @crewAIInc for orchestration.
- @GoogleDeepMind's Gemma 3 as the LLM.
- @ollama to serve Gemma 3 locally.
Let's build it!
Mar 13 • 7 tweets • 2 min read
Let's build a mini-ChatGPT with Google DeepMind's latest Gemma 3 (100% local):
Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.
It uses:
- @GoogleDeepMind's Gemma 3 as the LLM.
- @Ollama to locally serve Gemma 3.
- @chainlit_io for the UI.
Let's build it!
Mar 12 • 10 tweets • 3 min read
Model compression in ML, clearly explained (with code):
Model performance is rarely the only factor to determine which model will be deployed.
Instead, we also consider several operational metrics depicted below.
Knowledge distillation (KD) is popularly used to compress ML models before deployment.
Let's learn about it below.
Mar 4 • 10 tweets • 3 min read
Let's build a RAG app over audio files with DeepSeek-R1 (running locally):
Before we begin, here's a quick demo of what we're building!
We will use:
- @AssemblyAI for transcribing audio files.
- @qdrant_engine for the vector database.
- @llama_index for orchestration.
- DeepSeek-R1 as the LLM.
Let's dive in!
Mar 3 • 4 tweets • 1 min read
Learned this cool way of creating plots in a DataFrame's cell.
These are called sparklines. The idea is to render a plot within an HTML image tag in Jupyter.
Here's the code that adds a histogram in the cell:
Feb 25 • 10 tweets • 3 min read
Transformer vs. Mixture of Experts in LLMs, clearly explained (with visuals):
Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.
The visual below explains how they differ from Transformers.
Let's dive in to learn more about MoE!
Feb 24 • 9 tweets • 3 min read
Tool calling in LLMs, clearly explained (with code):
When generating text, the LLM may need to invoke external tools or APIs to perform specific tasks beyond their built-in capabilities.
This is known as tool calling, and it turns the AI into more like a coordinator.
Feb 22 • 9 tweets • 3 min read
Kernel trick in ML, clearly explained (with visuals):
So many ML algorithms use kernels for robust modeling:
• SVM
• KernelPCA, and more.
Consider two n-dimensional vectors X and Y. A kernel function lets us compute their dot product in m-dimensional space (m>>n) without even knowing the mapping.
Confused? Let's dive in!
Feb 17 • 7 tweets • 2 min read
Eigenvalues and eigenvectors, clearly explained:
The concept of eigenvalues & eigenvectors is widely used in data science but not well understood!
Today, I'll clearly explain their meaning & significance.
Feb 14 • 11 tweets • 4 min read
KV caching in LLMs, clearly explained (with visuals):
KV caching is a technique used to speed up LLM inference.
Before understanding the internal details, look at the inference speed difference in the video:
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in!
Feb 8 • 9 tweets • 3 min read
Let's build our own reasoning model (like DeepSeek-R1) 100% locally:*-
Before we begin, here's what we'll be doing.
We'll train our own reasoning model like DeepSeek-R1 (check the image).
To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- Llama 3.1-8B as the LLM to add reasoning capabilities to.
Let's implement this.
Feb 6 • 9 tweets • 3 min read
Decorators in Python, clearly explained (with code):
Decorators are one of the most powerful features of Python!
However, understanding them can be a bit overwhelming!
Today, let's understand how decorators work!
Feb 1 • 10 tweets • 3 min read
90% of Python programmers don't know these 7 uses of underscore:
1) Retrieve the last computed value
You can retrieve the last computed value using underscore, as demonstrated below:
Jan 31 • 7 tweets • 2 min read
Traditional RAG vs. Graph RAG, clearly explained (with visuals):
RAG has many issues.
Imagine you want to summarize a biography, and each chapter of the document covers a specific accomplishment of a person (P).
This is difficult with traditional RAG since it only retrieves the top-k relevant chunks, but this task needs full context.
Jan 30 • 10 tweets • 3 min read
Let's build a Multimodal RAG with DeepSeek's latest Janus-Pro (100% local):
The video depicts a multimodal RAG running locally on your computer.
We use:
- Colpali to understand and embed docs using vision capabilities.
- @qdrant_engine as the vector database.
- @deepseek_ai's latest Janus-Pro multimodal LLM to generate a response.
Let's build it!
Jan 28 • 10 tweets • 3 min read
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:
Before we begin, here's what we'll be doing.