Avi Chawla Profile picture
Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
6 subscribers
Aug 20 12 tweets 4 min read
DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code): Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!
Aug 17 11 tweets 4 min read
Model Context Protocol (MCP), clearly explained (with visuals): MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀
Aug 14 12 tweets 4 min read
A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals): RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!
Aug 11 10 tweets 3 min read
Let's fine-tune OpenAI gpt-oss (100% locally): Today, let's learn how to fine-tune OpenAI's latest gpt-oss locally.

We'll give it multilingual reasoning capabilities as shown in the video.

We'll use:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's begin!
Aug 8 12 tweets 4 min read
Enterprises build RAG over 100s of data sources, not one!

- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.

Let's build an MCP-powered RAG over 200+ sources (100% local): Enterprise data is scattered across many sources.

Today, we'll build a unified MCP server that can query 200+ sources from one interface.

Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally

Let's begin!
Aug 7 11 tweets 4 min read
I have been building AI Agents in production for over an year.

If you want to learn too, here's a simple tutorial (hands-on): Today, we'll build and deploy a Coding Agent that can scrape docs, write production-ready code, solve issues and raise PRs, directly from Slack.

Tech stack:
- Claude Code for code generation
- @xpander_ai as the Agent backend
- @firecrawl_dev for scraping

Let's begin!
Aug 6 14 tweets 7 min read
12 MCP, RAG, and Agents cheat sheets for AI engineers (with visuals): 1️⃣ Function calling & MCP for LLMs

Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.

The visual covers how Function Calling & MCP work under the hood.

Check the thread below 👇
Aug 4 14 tweets 5 min read
A simple technique makes RAG ~32x memory efficient!

- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant

Let's understand how to use it in RAG systems (with code): Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.

Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq

Let's build it!
Aug 3 14 tweets 4 min read
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML (GNN) in recommendation
- Spotify uses graph ML (HGNNs) in recommendation
- Pinterest uses graph ML (PingSage) in recommendation

Here are 6 must-know ways for graph feature engineering (with code): Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below! Image
Jul 30 12 tweets 5 min read
I have tested 100+ MCP servers in the last 3 months!

Let's use the best 6 to build an ultimate AI assistant for devs (100% local): Today, we'll build a local ultimate AI assistant using:

- @mcpuse to connect LLM to MCP servers
- @Stagehanddev MCP for browser access
- @firecrawl_dev MCP for scraping
- @ragieai MCP for multimodal RAG
- @zep_ai Graphiti MCP as memory
- Terminal & GitIngest MCP

Let's dive in!
Jul 27 11 tweets 4 min read
KV caching in LLMs, clearly explained (with visuals): KV caching is a technique used to speed up LLM inference.

Before understanding the internal details, look at the inference speed difference in the video:

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in!
Jul 26 9 tweets 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals): Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.

The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.

Let's dive in to learn more! Image
Jul 24 15 tweets 5 min read
Let's compare Qwen 3 Coder & Sonnet 4 for code generation: Qwen-3 Coder is Alibaba’s most powerful open-source coding LLM.

Today, let's build a pipeline to compare it to Sonnet 4 using:

- @LiteLLM for orchestration.
- @deepeval to build the eval pipeline (open-source).
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.

Let's dive in!
Jul 21 13 tweets 5 min read
4 stages of training LLMs from scratch, clearly explained (with visuals): Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!
Jul 20 11 tweets 4 min read
I have been training neural networks for 9 years now.

Here are 16 ways I actively use to optimize model training: Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!
Jul 19 13 tweets 4 min read
Andrew Ng's team once made a big mistake in a research paper.

And it happened due to randomly splitting the data.

Here's what happened: It is common to generate train and validation sets using random splitting.

However, in many situations, it can be fatal for model building.

Let's learn below! Image
Jul 18 12 tweets 5 min read
After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code): Image ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!
Jul 17 10 tweets 3 min read
How to compress ML models, clearly explained (with code): Model performance is rarely the only factor to determine which model will be deployed.

Instead, we also consider several operational metrics depicted below.

Knowledge distillation (KD) is popularly used to compress ML models before deployment.

Let's learn about it below.
Jul 15 13 tweets 4 min read
Let's build an MCP-powered financial analyst (100% local): Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @crewAIInc for multi-agent orchestration
- @Ollama to locally serve DeepSeek-R1 LLM
- @cursor_ai as the MCP host

Let's go! 🚀
Jul 11 14 tweets 5 min read
How to sync GPUs in multi-GPU training, clearly explained (with visuals): One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇
Jul 10 7 tweets 3 min read
Naive RAG vs. Agentic RAG, clearly explained (with visuals): Naive RAG has many issues:

- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.

- It cannot reason through complex queries.

- The system can't modify its strategy based on the problem.