Akshay 🚀 Profile picture
Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI
32 subscribers
Oct 20 11 tweets 4 min read
You're in an ML Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds. How do you make it 5x faster?"

You: "I'll optimize the model architecture and use a better GPU."

Interview over.

Here's what you missed: The real bottleneck isn't compute—it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Check this out👇
Oct 6 10 tweets 4 min read
You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context"

Interview over.

Here's what you missed: Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇 Image
Sep 25 11 tweets 4 min read
Local MCP clients are so underrated!

Everyone's using Cursor, Claude Desktop, and ChatGPT as MCP hosts, but if you're building your own apps that support MCP, you need custom clients.

Here's the problem: Writing MCP clients from scratch is painful and time-consuming.

Today, I'm showing you how to build custom MCP clients in minutes, not hours.

To prove this, I built a fully private, ultimate AI assistant that can:

- Connects to any MCP server
- Automates browser usage
- Scrapes web data seamlessly
- Controls the terminal of my computer
- Processes images, audio, and documents
- Remembers everything with knowledge graphs

The secret? mcp-use — a 100% open-source framework that makes MCP integration trivial.

Building custom MCP agents takes 3 steps:

1. Define your MCP server configuration
2. Connect any LLM with the MCP client
3. Deploy your agent

That's it. No complex setup, no proprietary dependencies.

The best part? Everything runs locally. Your data stays private, and you control the entire stack.

Full breakdown with code...👇 Let's break this down by exploring each integration and understanding how it works, using code and illustrations:
Sep 23 11 tweets 4 min read
Context engineering, clearly explained!

Everybody is talking about context engineering, but no one tells you what it actually means.

Today, I'll explain everything you need to know about context engineering in a step-by-step manner.

Here's an illustrated guide: So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇 Image
Sep 19 7 tweets 3 min read
We've all dealt with activation functions while working with neural nets.

- Sigmoid
- Tanh
- ReLu & Leaky ReLu
- Gelu

Ever wondered why they are so important❓🤔

Let me explain... 👇 Image Before we proceed, I want you to understand something!

You can think of a layer in a neural net as a function & multiple layers make the network a composite function.

Now, a composite function consisting of individual linear functions is also linear.

Check this out👇 Image
Sep 12 12 tweets 4 min read
10 MCP, AI Agents & LLM visual explainers:

(don't forget to bookmark 🔖) 1️⃣ MCP

MCP is a standardized way for LLMs to access tools via a client–server architecture.

Think of it as a JSON schema with agreed-upon endpoints.

Anthropic said, "Hey, let's all use the same JSON format when connecting AI to tools" and everyone said "Sure."

Check this👇
Sep 11 13 tweets 5 min read
I've put 100+ MCP apps into production!

There's one rule you can not miss if you want to do the same!

Here's the full breakdown (with code): There are primarily 2 factors that determine how well an MCP app works:

- If the model is selecting the right tool?
- And if it's correctly preparing the tool call?

Today, let's learn how to evaluate any MCP workflow using @deepeval's MCP evaluations (open-source).

Let's go!
Sep 9 10 tweets 3 min read
6 GitHub repositories that will give you superpowers as an AI Engineer: You can use these 6 open-source repos/tools for:

- building an enterprise-grade RAG solution
- build and deploy multi-agent workflows
- finetune 100+ LLMs
- and more...

Let's learn more about them one by one: Image
Sep 7 12 tweets 4 min read
8 key skills to become a full-stack AI Engineer: Production-grade AI systems demand deep understanding of how LLMs are engineered, deployed, and optimized.

Here are the 8 pillars that define serious LLM development:

Let's dive in! 🚀
Sep 6 9 tweets 3 min read
K-Means has two major problems:

- The number of clusters must be known
- It doesn't handle outliers

Here’s an algorithm that addresses both issues: Introducing DBSCAN, a density-based clustering algorithm.

Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.

It's very easy to understand, just follow along ...👇 Image
Sep 4 12 tweets 4 min read
Let's build a reasoning LLM, from scratch (100% local): Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀
Sep 2 13 tweets 5 min read
4 stages of training LLMs from scratch, clearly explained (with visuals): Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!
Aug 30 14 tweets 5 min read
A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Let's understand how you can use it in LLM apps (with code): Today, we'll use the voyage-context-3 embedding model by @VoyageAI to do RAG over audio data.

We'll also use:
- @MongoDB Atlas Vector Search as vector DB
- @AssemblyAI for transcription
- @llama_index for orchestration
- gpt-oss as the LLM

Let's begin!
Aug 29 11 tweets 4 min read
I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training: Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!
Aug 26 13 tweets 4 min read
I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code): Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀 Image
Aug 24 12 tweets 5 min read
After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code): Image ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!
Aug 22 14 tweets 5 min read
Let's build an MCP server (100% local): Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀
Aug 21 15 tweets 6 min read
A simple technique makes RAG up to 40x faster & 32x memory efficient!

- Perplexity uses it in its search index
- Google uses it in Vertex RAG engine
- Azure uses it in its search pipeline

Let's understand how to use it in a RAG system (with code): Today, we're building a multi-agent legal assistant that can query 50M+ vectors in <30ms using Binary Quantization (BQ).

Tech stack:

- @milvusio to self-host vectorDB with BQ
- @firecrawl_dev for web search
- @crewAIInc for orchestration
- @ollama to serve GPT-OSS

Let's go! 🚀
Aug 19 11 tweets 4 min read
JSON prompting for LLMs, clearly explained: I used to think prompt engineering is dead!

Then I discovered JSON prompting and everything changed.

Today, I'll show you exactly what JSON prompting is and how it can drastically improve your AI outputs!

Let's dive in! 🚀
Aug 18 8 tweets 3 min read
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals): Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.
Aug 17 14 tweets 5 min read
This simple technique can scale training from 1-1000+ GPUs.

- OpenAI uses it to train GPT models
- Google uses it in their TPUs to train Gemini
- Meta uses it to train Llamas on massive GPU clusters

Let's learn how to sync GPUs in multi-GPU training (with visuals): One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇