Avi Chawla Profile picture
Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
2 subscribers
Mar 25 14 tweets 5 min read
Let's build an MCP server (100% locally): Before diving in, here's what we'll be doing:

- Understand MCP with a simple analogy.
- Build a local MCP server and interact with it via @cursor_ai IDE.
- Integrate @firecrawl_dev's MCP server and interact with its tools (shown in the video).

Let's dive in 🚀!
Mar 21 9 tweets 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals): Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.

The visual explains 5 levels of AI agency—from simple responders to fully autonomous agents.

Let's dive to learn more about them.
Mar 15 9 tweets 3 min read
Let's fine-tune DeepMind's latest Gemma 3 (100% locally): Before we begin, here's what we'll be doing.

We'll fine-tune our private and locally running Gemma 3.

To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.

Let's begin! Image
Mar 14 15 tweets 5 min read
Let's build a multi-agent book writer, powered by DeepMind's Gemma 3 (100% local): Today, we are building an Agentic workflow that writes a 20k word book from a 3-5 word book title.

Tech stack:
- Bright Data to scrape web at scale.
- @crewAIInc for orchestration.
- @GoogleDeepMind's Gemma 3 as the LLM.
- @ollama to serve Gemma 3 locally.

Let's build it!
Mar 13 7 tweets 2 min read
Let's build a mini-ChatGPT with Google DeepMind's latest Gemma 3 (100% local): Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.

It uses:
- @GoogleDeepMind's Gemma 3 as the LLM.
- @Ollama to locally serve Gemma 3.
- @chainlit_io for the UI.

Let's build it!
Mar 12 10 tweets 3 min read
Model compression in ML, clearly explained (with code): Model performance is rarely the only factor to determine which model will be deployed.

Instead, we also consider several operational metrics depicted below.

Knowledge distillation (KD) is popularly used to compress ML models before deployment.

Let's learn about it below.
Mar 4 10 tweets 3 min read
Let's build a RAG app over audio files with DeepSeek-R1 (running locally): Before we begin, here's a quick demo of what we're building!

We will use:

- @AssemblyAI for transcribing audio files.
- @qdrant_engine for the vector database.
- @llama_index for orchestration.
- DeepSeek-R1 as the LLM.

Let's dive in!
Mar 3 4 tweets 1 min read
Learned this cool way of creating plots in a DataFrame's cell.

These are called sparklines. The idea is to render a plot within an HTML image tag in Jupyter. Image Here's the code that adds a histogram in the cell: Image
Feb 25 10 tweets 3 min read
Transformer vs. Mixture of Experts in LLMs, clearly explained (with visuals): Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.

The visual below explains how they differ from Transformers.

Let's dive in to learn more about MoE!
Feb 24 9 tweets 3 min read
Tool calling in LLMs, clearly explained (with code): When generating text, the LLM may need to invoke external tools or APIs to perform specific tasks beyond their built-in capabilities.

This is known as tool calling, and it turns the AI into more like a coordinator. Image
Feb 22 9 tweets 3 min read
Kernel trick in ML, clearly explained (with visuals): So many ML algorithms use kernels for robust modeling:
• SVM
• KernelPCA, and more.

Consider two n-dimensional vectors X and Y. A kernel function lets us compute their dot product in m-dimensional space (m>>n) without even knowing the mapping.

Confused? Let's dive in! Image
Feb 17 7 tweets 2 min read
Eigenvalues and eigenvectors, clearly explained: The concept of eigenvalues & eigenvectors is widely used in data science but not well understood!

Today, I'll clearly explain their meaning & significance. Image
Feb 14 11 tweets 4 min read
KV caching in LLMs, clearly explained (with visuals): KV caching is a technique used to speed up LLM inference.

Before understanding the internal details, look at the inference speed difference in the video:

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in!
Feb 8 9 tweets 3 min read
Let's build our own reasoning model (like DeepSeek-R1) 100% locally:*- Before we begin, here's what we'll be doing.

We'll train our own reasoning model like DeepSeek-R1 (check the image).

To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- Llama 3.1-8B as the LLM to add reasoning capabilities to.

Let's implement this. Image
Feb 6 9 tweets 3 min read
Decorators in Python, clearly explained (with code): Decorators are one of the most powerful features of Python!

However, understanding them can be a bit overwhelming!

Today, let's understand how decorators work! Image
Feb 1 10 tweets 3 min read
90% of Python programmers don't know these 7 uses of underscore: 1) Retrieve the last computed value

You can retrieve the last computed value using underscore, as demonstrated below: Image
Jan 31 7 tweets 2 min read
Traditional RAG vs. Graph RAG, clearly explained (with visuals): RAG has many issues.

Imagine you want to summarize a biography, and each chapter of the document covers a specific accomplishment of a person (P).

This is difficult with traditional RAG since it only retrieves the top-k relevant chunks, but this task needs full context. Image
Jan 30 10 tweets 3 min read
Let's build a Multimodal RAG with DeepSeek's latest Janus-Pro (100% local): The video depicts a multimodal RAG running locally on your computer.

We use:
- Colpali to understand and embed docs using vision capabilities.
- @qdrant_engine as the vector database.
- @deepseek_ai's latest Janus-Pro multimodal LLM to generate a response.

Let's build it!
Jan 28 10 tweets 3 min read
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally: Before we begin, here's what we'll be doing.

We'll fine-tune our private and locally running DeepSeek-R1 (distilled Llama variant).

To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.

Let's begin!
Jan 27 11 tweets 3 min read
5 LLM fine-tuning techniques, clearly explained (with visuals): Traditional fine-tuning (depicted below) is infeasible with LLMs.

This is because LLMs have billions of parameters (and 100s of GBs in size).

Not everyone has access to such computing infrastructure, and it does not make business sense either.
Jan 26 7 tweets 2 min read
95% of Jupyter Notebook users don't know these 5 AWESOME tricks: 1) Retrieve a cell’s output in Jupyter

If you often forget to assign the results of a Jupyter cell to a variable, you can use the `Out` dictionary to retrieve the output: Image