Simplifying LLMs, AI Agents, RAGs and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
27 subscribers
Aug 22 • 14 tweets • 5 min read
Let's build an MCP server (100% local):
Before diving in, here's what we'll be doing today:
- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser
Let's go! 🚀
Aug 21 • 15 tweets • 6 min read
A simple technique makes RAG up to 40x faster & 32x memory efficient!
- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline
Let's understand how to use it in a RAG system (with code):
Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).
Tech stack:
- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS
Let's go! 🚀
Aug 19 • 11 tweets • 4 min read
JSON prompting for LLMs, clearly explained:
I used to think prompt engineering is dead!
Then I discovered JSON prompting and everything changed.
Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!
Let's dive in! 🚀
Aug 18 • 8 tweets • 3 min read
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):
Agentic applications require both A2A and MCP.
- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.
Today, let's clearly understand what A2A is and how it can work with MCP.
Aug 17 • 14 tweets • 5 min read
This simple technique can scale training from 1-1000+ GPUs.
- OpenAI uses it to train GPT models
- Google uses it in their TPUs to train Gemini
- Meta uses it to train Llamas on massive GPU clusters
Let's learn how to sync GPUs in multi-GPU training (with visuals):
One major run-time bottleneck in multi-GPU training happens during GPU synchronization.
For instance, in multi-GPU training via data parallelism:
- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.
Check this 👇
Aug 15 • 10 tweets • 3 min read
Google just dropped a new LLM!
You can run it locally on just 0.5 GB RAM.
Let's fine-tune this on our own data (100% locally):
Google released Gemma 3 270M, a new model for hyper-efficient local AI!
We'll fine-tune this model and make it very smart at playing chess and predict the next move.
Tech stack:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.
Let's go! 🚀
Aug 14 • 13 tweets • 4 min read
How LLMs work, clearly explained:
Before diving into LLMs, we must understand conditional probability.
Let's consider a population of 14 individuals:
- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none
Here's how it looks 👇
Aug 12 • 7 tweets • 3 min read
Traditional RAG vs. Agentic RAG, clearly explained (with visuals):
Traditional RAG has many issues:
- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.
- It cannot reason through complex queries.
- The system can't modify its strategy based on the problem.
Aug 10 • 13 tweets • 4 min read
Let's build a Browser Automation Agent using gpt-oss (100% local):
Browser is still the most universal interface with 4.3 billion page visited every day!
Here's a quick demo of how we can completely automate it!
Tech stack:
- @stagehanddev open-source AI browser automation
- @crewAIInc for orchestration
- @ollama to run gpt-oss
Let's go!🚀
Aug 9 • 12 tweets • 6 min read
I switched to AI Engineering 2 years ago!
It was the best career move I ever made.
If you want to start today, here's a roadmap:
1️⃣ Master Python
While many are busy vibe coding, those with strong coding fundamentals will always stand out.
Python is the language AI community speaks, and Harvard's CS50p is the best place to learn it.
Let's compare GPT-5 and Claude Opus-4.1 for code generation:
Today, we're building a CodeArena, where you can compare any two code-gen models side-by-side.
Tech stack:
- @LiteLLM for orchestration
- @Cometml's Opik to build the eval pipeline
- @OpenRouterAI to access cutting-edge models
- @LightningAI for hosting CodeArena
Let's go!🚀
Aug 6 • 14 tweets • 5 min read
Let's compare OpenAI gpt-oss and Qwen-3 on maths & reasoning:
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @LiteLLM for orchestration
- @Cometml's Opik to build the eval pipeline (open-source)
- @OpenRouterAI to access the models
You'll also learn about G-Eval & building custom eval metrics.
Let's go! 🚀
Aug 5 • 13 tweets • 5 min read
Tech giants use Multimodal RAG every day in production!
- Spotify uses it to answer music queries
- YouTube uses it to turn prompts into tracks
- Amazon Music uses it to create playlist from prompt
Let's learn how to build a Multimodal Agentic RAG (with code):
Today, we'll build a multimodal Agentic RAG that can query documents and audio files using the user's speech.
Tech stack:
- @AssemblyAI for transcription.
- @milvusio as the vector DB.
- @beam_cloud for deployment.
- @crewAIInc Flows for orchestration.
Let's build it!
Aug 4 • 13 tweets • 4 min read
Sub-agents in Claude Code, clearly explained:
Claude Code subagents solved two of AI’s biggest problems:
- Large Context management
- Right tool selection
Making it the best AI coding assistant!
Let's understand how to build and use Sub-agents in Claude code:
Aug 3 • 10 tweets • 4 min read
uv in Python, clearly explained (with code):
uv is incredibly fast.
- Creating virtual envs. using uv is ~80x faster than python -m venv.
- Package installation is 4–12x faster without caching, and ~100x with caching
Today, let's understand how to use uv for Python package management.
Let's dive in!
Aug 1 • 14 tweets • 5 min read
Let's build a (Text2SQL + RAG), hybrid agentic workflow:
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI to validate the response
- @OpenRouterAI to access the latest Qwen3
Let's go! 🚀
Jul 31 • 17 tweets • 6 min read
"Attention is all you need" implemented from scratch using PyTorch:
This is the paper that revolutionized AI!
I have been fine-tuning LLMs for more that 2 years now!
Here are the top 5 LLM fine-tuning techniques, explained with visuals:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).
Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.
Today, we’ll cover the top 5 PEFT techniques, step by step.
Jul 25 • 10 tweets • 3 min read
How LLMs train LLMs, clearly explained (with visuals):
LLMs learn not only from raw text but also from other models.
Google’s Gemma 2 and 3, for example, were distilled from the larger Gemini model.
Today we cover, the three most common knowledge‑distillation methods.
Let's dive in! 🚀
Jul 24 • 13 tweets • 4 min read
Let's build a "Chat with your Code" RAG app using Qwen3-Coder:
Before we begin, take a look at what we're about to create!
Tech stack:
- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI codex to validate the response
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.
Let's go! 🚀
Jul 23 • 15 tweets • 5 min read
I just built the ultimate MCP server for Multimodal AI.
It lets you do RAG over audio, video, images and text!
100% open-source, here's the full breakdown...👇
Before we dive in, here's a quick demo of what we're building!
Tech stack:
- @pixeltablehq to build the multi-modal AI infrastructure
- @crewAIInc to orchestrate the agentic workflow
Quickly check the thread, then return here for a detailed overview. 🚀