Akshay 🚀 Profile picture
Simplifying LLMs, AI Agents, RAGs and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
30 subscribers
Sep 12 12 tweets 4 min read
10 MCP, AI Agents & LLM visual explainers:

(don't forget to bookmark 🔖) 1️⃣ MCP

MCP is a standardized way for LLMs to access tools via a client–server architecture.

Think of it as a JSON schema with agreed-upon endpoints.

Anthropic said, "Hey, let's all use the same JSON format when connecting AI to tools" and everyone said "Sure."

Check this👇
Sep 11 13 tweets 5 min read
I've put 100+ MCP apps into production!

There's one rule you can not miss if you want to do the same!

Here's the full breakdown (with code): There are primarily 2 factors that determine how well an MCP app works:

- If the model is selecting the right tool?
- And if it's correctly preparing the tool call?

Today, let's learn how to evaluate any MCP workflow using @deepeval's MCP evaluations (open-source).

Let's go!
Sep 9 10 tweets 3 min read
6 GitHub repositories that will give you superpowers as an AI Engineer: You can use these 6 open-source repos/tools for:

- building an enterprise-grade RAG solution
- build and deploy multi-agent workflows
- finetune 100+ LLMs
- and more...

Let's learn more about them one by one: Image
Sep 7 12 tweets 4 min read
8 key skills to become a full-stack AI Engineer: Production-grade AI systems demand deep understanding of how LLMs are engineered, deployed, and optimized.

Here are the 8 pillars that define serious LLM development:

Let's dive in! 🚀
Sep 6 9 tweets 3 min read
K-Means has two major problems:

- The number of clusters must be known
- It doesn't handle outliers

Here’s an algorithm that addresses both issues: Introducing DBSCAN, a density-based clustering algorithm.

Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.

It's very easy to understand, just follow along ...👇 Image
Sep 4 12 tweets 4 min read
Let's build a reasoning LLM, from scratch (100% local): Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀
Sep 2 13 tweets 5 min read
4 stages of training LLMs from scratch, clearly explained (with visuals): Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!
Aug 30 14 tweets 5 min read
A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Let's understand how you can use it in LLM apps (with code): Today, we'll use the voyage-context-3 embedding model by @VoyageAI to do RAG over audio data.

We'll also use:
- @MongoDB Atlas Vector Search as vector DB
- @AssemblyAI for transcription
- @llama_index for orchestration
- gpt-oss as the LLM

Let's begin!
Aug 29 11 tweets 4 min read
I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training: Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!
Aug 26 13 tweets 4 min read
I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code): Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀 Image
Aug 24 12 tweets 5 min read
After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code): Image ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!
Aug 22 14 tweets 5 min read
Let's build an MCP server (100% local): Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀
Aug 21 15 tweets 6 min read
A simple technique makes RAG up to 40x faster & 32x memory efficient!

- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline

Let's understand how to use it in a RAG system (with code): Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).

Tech stack:

- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS

Let's go! 🚀
Aug 19 11 tweets 4 min read
JSON prompting for LLMs, clearly explained: I used to think prompt engineering is dead!

Then I discovered JSON prompting and everything changed.

Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!

Let's dive in! 🚀
Aug 18 8 tweets 3 min read
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals): Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.
Aug 17 14 tweets 5 min read
This simple technique can scale training from 1-1000+ GPUs.

- OpenAI uses it to train GPT models
- Google uses it in their TPUs to train Gemini
- Meta uses it to train Llamas on massive GPU clusters

Let's learn how to sync GPUs in multi-GPU training (with visuals): One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇
Aug 15 10 tweets 3 min read
Google just dropped a new LLM!

You can run it locally on just 0.5 GB RAM.

Let's fine-tune this on our own data (100% locally): Google released Gemma 3 270M, a new model for hyper-efficient local AI!

We'll fine-tune this model and make it very smart at playing chess and predict the next move.

Tech stack:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's go! 🚀
Aug 14 13 tweets 4 min read
How LLMs work, clearly explained: Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇 Image
Aug 12 7 tweets 3 min read
Traditional RAG vs. Agentic RAG, clearly explained (with visuals): Traditional RAG has many issues:

- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.

- It cannot reason through complex queries.

- The system can't modify its strategy based on the problem.
Aug 10 13 tweets 4 min read
Let's build a Browser Automation Agent using gpt-oss (100% local): Browser is still the most universal interface with 4.3 billion page visited every day!

Here's a quick demo of how we can completely automate it!

Tech stack:

- @stagehanddev open-source AI browser automation
- @crewAIInc for orchestration
- @ollama to run gpt-oss

Let's go!🚀
Aug 9 12 tweets 6 min read
I switched to AI Engineering 2 years ago!

It was the best career move I ever made.

If you want to start today, here's a roadmap: 1️⃣ Master Python

While many are busy vibe coding, those with strong coding fundamentals will always stand out.

Python is the language AI community speaks, and Harvard's CS50p is the best place to learn it.

🔗 pll.harvard.edu/course/cs50s-i…Image