Avi Chawla Profile picture
Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
10 subscribers
Sep 11 15 tweets 6 min read
Let's build a context engineering workflow, step by step: Today, we'll build a multi-agent research assistant using context engineering principles.

Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration

Let's go!
Sep 8 11 tweets 4 min read
I have been fine-tuning LLMs for over two years now!

Here are the top 5 LLM fine-tuning techniques, explained visually: Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.

Today, we’ll cover the top 5 PEFT techniques, step by step. Image
Sep 6 13 tweets 4 min read
Let's generate our own LLM fine-tuning dataset (100% local): Before we begin, here's what we're doing today!

We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?

Finally, we'll create our own instruction fine-tuning dataset.

Let's dive in! Image
Sep 4 11 tweets 4 min read
7 LLM generation parameters, clearly explained (with visuals): Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

The visual shows 7 parameters that matter most.

Let's understand them one by one! Image
Aug 28 9 tweets 3 min read
Temperature in LLMs, clearly explained (with code): Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇 Image
Aug 27 11 tweets 4 min read
There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local): To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!
Aug 25 12 tweets 4 min read
I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code): A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!
Aug 23 11 tweets 4 min read
The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs: In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇 Image
Aug 22 10 tweets 3 min read
You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths): Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths! Image
Aug 20 12 tweets 4 min read
DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code): Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!
Aug 17 11 tweets 4 min read
Model Context Protocol (MCP), clearly explained (with visuals): MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀
Aug 14 12 tweets 4 min read
A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals): RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!
Aug 11 10 tweets 3 min read
Let's fine-tune OpenAI gpt-oss (100% locally): Today, let's learn how to fine-tune OpenAI's latest gpt-oss locally.

We'll give it multilingual reasoning capabilities as shown in the video.

We'll use:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's begin!
Aug 8 12 tweets 4 min read
Enterprises build RAG over 100s of data sources, not one!

- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.

Let's build an MCP-powered RAG over 200+ sources (100% local): Enterprise data is scattered across many sources.

Today, we'll build a unified MCP server that can query 200+ sources from one interface.

Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally

Let's begin!
Aug 7 11 tweets 4 min read
I have been building AI Agents in production for over an year.

If you want to learn too, here's a simple tutorial (hands-on): Today, we'll build and deploy a Coding Agent that can scrape docs, write production-ready code, solve issues and raise PRs, directly from Slack.

Tech stack:
- Claude Code for code generation
- @xpander_ai as the Agent backend
- @firecrawl_dev for scraping

Let's begin!
Aug 6 14 tweets 7 min read
12 MCP, RAG, and Agents cheat sheets for AI engineers (with visuals): 1️⃣ Function calling & MCP for LLMs

Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.

The visual covers how Function Calling & MCP work under the hood.

Check the thread below 👇
Aug 4 14 tweets 5 min read
A simple technique makes RAG ~32x memory efficient!

- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant

Let's understand how to use it in RAG systems (with code): Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.

Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq

Let's build it!
Aug 3 14 tweets 4 min read
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML (GNN) in recommendation
- Spotify uses graph ML (HGNNs) in recommendation
- Pinterest uses graph ML (PingSage) in recommendation

Here are 6 must-know ways for graph feature engineering (with code): Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below! Image
Jul 30 12 tweets 5 min read
I have tested 100+ MCP servers in the last 3 months!

Let's use the best 6 to build an ultimate AI assistant for devs (100% local): Today, we'll build a local ultimate AI assistant using:

- @mcpuse to connect LLM to MCP servers
- @Stagehanddev MCP for browser access
- @firecrawl_dev MCP for scraping
- @ragieai MCP for multimodal RAG
- @zep_ai Graphiti MCP as memory
- Terminal & GitIngest MCP

Let's dive in!
Jul 27 11 tweets 4 min read
KV caching in LLMs, clearly explained (with visuals): KV caching is a technique used to speed up LLM inference.

Before understanding the internal details, look at the inference speed difference in the video:

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in!
Jul 26 9 tweets 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals): Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.

The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.

Let's dive in to learn more! Image