Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
10 subscribers
Sep 11 • 15 tweets • 6 min read
Let's build a context engineering workflow, step by step:
Today, we'll build a multi-agent research assistant using context engineering principles.
Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration
Let's go!
Sep 8 • 11 tweets • 4 min read
I have been fine-tuning LLMs for over two years now!
Here are the top 5 LLM fine-tuning techniques, explained visually:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).
Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.
Today, we’ll cover the top 5 PEFT techniques, step by step.
Sep 6 • 13 tweets • 4 min read
Let's generate our own LLM fine-tuning dataset (100% local):
Before we begin, here's what we're doing today!
We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?
Finally, we'll create our own instruction fine-tuning dataset.
Let's dive in!
Sep 4 • 11 tweets • 4 min read
7 LLM generation parameters, clearly explained (with visuals):
Every generation from an LLM is shaped by parameters under the hood.
Knowing how to tune is important so that you can produce sharp and more controlled outputs.
The visual shows 7 parameters that matter most.
Let's understand them one by one!
Aug 28 • 9 tweets • 3 min read
Temperature in LLMs, clearly explained (with code):
Let's prompt OpenAI GPT-3.5 with a low temperature value twice.
It produces identical responses from the LLM.
Check the response below👇
Aug 27 • 11 tweets • 4 min read
There's a new way to build production-grade MCP servers.
- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.
Here's a step-by-step breakdown (100% local):
To build MCP servers from scratch with custom tools, one has to:
- read the API docs
- implement MCP tools
- test them, and much more
Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).
Let's dive in!
Aug 25 • 12 tweets • 4 min read
I removed 74% of neurons from a neural network.
It dropped the accuracy by just 0.50%.
Here's a breakdown (with code):
A trained neural network always has neurons that do not substantially contribute to the performance.
But they still consume memory.
These can be removed without significantly compromising accuracy.
Let's understand how they extend the context length of LLMs:
In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.
Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.
Check this 👇
Aug 22 • 10 tweets • 3 min read
You are in an ML interview.
Your interviewer asks: "Why is Kernel Trick called a Trick?"
Here's how to answer (with simple maths):
Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.
If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.
Let's understand more with maths!
Aug 20 • 12 tweets • 4 min read
DeepMind built a simple RAG technique that:
- reduces hallucinations by 40%
- improves answer relevancy by 50%
Let's understand how to use it in RAG systems (with code):
Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:
- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration
Let's go!
Aug 17 • 11 tweets • 4 min read
Model Context Protocol (MCP), clearly explained (with visuals):
MCP is like a USB-C port for your AI applications.
Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.
Let's dive in! 🚀
Aug 14 • 12 tweets • 4 min read
A new embedding model cuts vector DB costs by ~200x.
It also outperforms OpenAI and Cohere models.
Here's a complete breakdown (with visuals):
RAG is 80% retrieval and 20% generation.
So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.
Contextualized chunk embedding models solve this.
Let's dive in to learn more!
Aug 11 • 10 tweets • 3 min read
Let's fine-tune OpenAI gpt-oss (100% locally):
Today, let's learn how to fine-tune OpenAI's latest gpt-oss locally.
We'll give it multilingual reasoning capabilities as shown in the video.
We'll use:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.
Let's begin!
Aug 8 • 12 tweets • 4 min read
Enterprises build RAG over 100s of data sources, not one!
- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.
Let's build an MCP-powered RAG over 200+ sources (100% local):
Enterprise data is scattered across many sources.
Today, we'll build a unified MCP server that can query 200+ sources from one interface.
Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally
Let's begin!
Aug 7 • 11 tweets • 4 min read
I have been building AI Agents in production for over an year.
If you want to learn too, here's a simple tutorial (hands-on):
Today, we'll build and deploy a Coding Agent that can scrape docs, write production-ready code, solve issues and raise PRs, directly from Slack.
Tech stack:
- Claude Code for code generation
- @xpander_ai as the Agent backend
- @firecrawl_dev for scraping
Let's begin!
Aug 6 • 14 tweets • 7 min read
12 MCP, RAG, and Agents cheat sheets for AI engineers (with visuals):
1️⃣ Function calling & MCP for LLMs
Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.
The visual covers how Function Calling & MCP work under the hood.
A simple technique makes RAG ~32x memory efficient!
- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant
Let's understand how to use it in RAG systems (with code):
Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.
Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq
Let's build it!
Aug 3 • 14 tweets • 4 min read
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML (GNN) in recommendation
- Spotify uses graph ML (HGNNs) in recommendation
- Pinterest uses graph ML (PingSage) in recommendation
Here are 6 must-know ways for graph feature engineering (with code):
Like images, text, and tabular datasets have features, so do graph datasets.
This means when building models on graph datasets, we can engineer these features to achieve better performance.
Let's discuss some feature engineering techniques below!
Jul 30 • 12 tweets • 5 min read
I have tested 100+ MCP servers in the last 3 months!
Let's use the best 6 to build an ultimate AI assistant for devs (100% local):
Today, we'll build a local ultimate AI assistant using:
- @mcpuse to connect LLM to MCP servers
- @Stagehanddev MCP for browser access
- @firecrawl_dev MCP for scraping
- @ragieai MCP for multimodal RAG
- @zep_ai Graphiti MCP as memory
- Terminal & GitIngest MCP
Let's dive in!
Jul 27 • 11 tweets • 4 min read
KV caching in LLMs, clearly explained (with visuals):
KV caching is a technique used to speed up LLM inference.
Before understanding the internal details, look at the inference speed difference in the video:
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in!
Jul 26 • 9 tweets • 3 min read
5 levels of Agentic AI systems, clearly explained (with visuals):
Agentic AI systems don't just generate text; they can make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency, starting from simple responders to fully autonomous agents.