Latest Twitter Threads by @akshay_pachaar on Thread Reader App

Nov 23 • 14 tweets • 5 min read

You’re in an ML Engineer interview at Google.

Interviewer: We need to train an LLM across 1,000 GPUs. How would you make sure all GPUs share what they learn?

You: Use a central parameter server to aggregate and redistribute the weights.

Interview over.

Here’s what you missed: One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇

Nov 21 • 4 tweets • 3 min read

NOBODY wants to send their data to Google or OpenAI.

Yet here we are, shipping proprietary code, customer information, and sensitive business logic to closed-source APIs we don't control.

While everyone's chasing the latest closed-source releases, open-source models are quietly becoming the practical choice for many production systems.

Here's what everyone is missing:

Open-source models are catching up fast, and they bring something the big labs can't: privacy, speed, and control.

I built a playground to test this myself. Used CometML's Opik to evaluate models on real code generation tasks - testing correctness, readability, and best practices against actual GitHub repos.

Here's what surprised me:

OSS models like MiniMax-M2, Kimi k2 performed on par with the likes of Gemini 3 and Claude Sonnet 4.5 on most tasks.

But practically MiniMax-M2 turns out to be a winner as it's twice as fast and 12x cheaper when you compare it to models like Sonnet 4.5.

Well, this isn't just about saving money.

When your model is smaller and faster, you can deploy it in places closed-source APIs can't reach:

↳ Real-time applications that need sub-second responses
↳ Edge devices where latency kills user experience
↳ On-premise systems where data never leaves your infrastructure

MiniMax-M2 runs with only 10B activated parameters. That efficiency means lower latency, higher throughput, and the ability to handle interactive agents without breaking the bank.

The intelligence-to-cost ratio here changes what's possible.

You're not choosing between quality and affordability anymore. You're not sacrificing privacy for performance. The gap is closing, and in many cases, it's already closed.

If you're building anything that needs to be fast, private, or deployed at scale, it's worth taking a look at what's now available.

MiniMax-M2 is 100% open-source, free for developers right now. I have shared the link to their GitHub repo in the next tweet.

You will also find the code for the playground and evaluations I've done.

@MiniMax__AI GitHub repo for M2:

(don't forget to star 🌟)
github.com/MiniMax-AI/Min…

Oct 27 • 4 tweets • 2 min read

Claude Skills might be the biggest upgrade to AI agents so far!

Some say it's even bigger than MCP.

I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything.

In this video, I'll share everything I've learned so far.

It covers:

> The core idea (skills as SOPs for agents)
> Anatomy of a skill
> Skills vs. MCP vs. Projects vs. Subagents
> Building your own skill
> Hands-on example

Skills are the early signs of continual learning, and they can change how we work with agents forever!

Here's everything you need to know:

Skills vs. Projects vs. Subagents:

Oct 25 • 9 tweets • 4 min read

I've been coding in Python for 9 years now.

If I were to start over today, here's a complete roadmap: While everyone's vibecoding, a few truly understand what's actually happening.

This roadmap matters more now than ever.

So, let's dive in! 🚀

Oct 20 • 11 tweets • 4 min read

You're in an ML Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds. How do you make it 5x faster?"

You: "I'll optimize the model architecture and use a better GPU."

Interview over.

Here's what you missed: The real bottleneck isn't compute—it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Check this out👇

Oct 6 • 10 tweets • 4 min read

You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context"

Interview over.

Here's what you missed: Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇

Sep 25 • 11 tweets • 4 min read

Local MCP clients are so underrated!

Everyone's using Cursor, Claude Desktop, and ChatGPT as MCP hosts, but if you're building your own apps that support MCP, you need custom clients.

Here's the problem: Writing MCP clients from scratch is painful and time-consuming.

Today, I'm showing you how to build custom MCP clients in minutes, not hours.

To prove this, I built a fully private, ultimate AI assistant that can:

- Connects to any MCP server
- Automates browser usage
- Scrapes web data seamlessly
- Controls the terminal of my computer
- Processes images, audio, and documents
- Remembers everything with knowledge graphs

The secret? mcp-use — a 100% open-source framework that makes MCP integration trivial.

Building custom MCP agents takes 3 steps:

1. Define your MCP server configuration
2. Connect any LLM with the MCP client
3. Deploy your agent

That's it. No complex setup, no proprietary dependencies.

The best part? Everything runs locally. Your data stays private, and you control the entire stack.

Full breakdown with code...👇

Let's break this down by exploring each integration and understanding how it works, using code and illustrations:

Sep 23 • 11 tweets • 4 min read

Context engineering, clearly explained!

Everybody is talking about context engineering, but no one tells you what it actually means.

Today, I'll explain everything you need to know about context engineering in a step-by-step manner.

Here's an illustrated guide:

So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇

Sep 19 • 7 tweets • 3 min read

We've all dealt with activation functions while working with neural nets.

- Sigmoid
- Tanh
- ReLu & Leaky ReLu
- Gelu

Ever wondered why they are so important❓🤔

Let me explain... 👇

Before we proceed, I want you to understand something!

You can think of a layer in a neural net as a function & multiple layers make the network a composite function.

Now, a composite function consisting of individual linear functions is also linear.

Check this out👇

Sep 12 • 12 tweets • 4 min read

10 MCP, AI Agents & LLM visual explainers:

(don't forget to bookmark 🔖) 1️⃣ MCP

MCP is a standardized way for LLMs to access tools via a client–server architecture.

Think of it as a JSON schema with agreed-upon endpoints.

Anthropic said, "Hey, let's all use the same JSON format when connecting AI to tools" and everyone said "Sure."

Check this👇

Sep 11 • 13 tweets • 5 min read

I've put 100+ MCP apps into production!

There's one rule you can not miss if you want to do the same!

Here's the full breakdown (with code): There are primarily 2 factors that determine how well an MCP app works:

- If the model is selecting the right tool?
- And if it's correctly preparing the tool call?

Today, let's learn how to evaluate any MCP workflow using @deepeval's MCP evaluations (open-source).

Let's go!

Sep 9 • 10 tweets • 3 min read

6 GitHub repositories that will give you superpowers as an AI Engineer: You can use these 6 open-source repos/tools for:

- building an enterprise-grade RAG solution
- build and deploy multi-agent workflows
- finetune 100+ LLMs
- and more...

Let's learn more about them one by one:

Sep 7 • 12 tweets • 4 min read

8 key skills to become a full-stack AI Engineer: Production-grade AI systems demand deep understanding of how LLMs are engineered, deployed, and optimized.

Here are the 8 pillars that define serious LLM development:

Let's dive in! 🚀

Sep 6 • 9 tweets • 3 min read

K-Means has two major problems:

- The number of clusters must be known
- It doesn't handle outliers

Here’s an algorithm that addresses both issues: Introducing DBSCAN, a density-based clustering algorithm.

Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.

It's very easy to understand, just follow along ...👇

Sep 4 • 12 tweets • 4 min read

Let's build a reasoning LLM, from scratch (100% local): Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀

Sep 2 • 13 tweets • 5 min read

4 stages of training LLMs from scratch, clearly explained (with visuals): Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!

Aug 30 • 14 tweets • 5 min read

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Let's understand how you can use it in LLM apps (with code): Today, we'll use the voyage-context-3 embedding model by @VoyageAI to do RAG over audio data.

We'll also use:
- @MongoDB Atlas Vector Search as vector DB
- @AssemblyAI for transcription
- @llama_index for orchestration
- gpt-oss as the LLM

Let's begin!

Aug 29 • 11 tweets • 4 min read

I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training: Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!

Aug 26 • 13 tweets • 4 min read

I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code): Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀

Aug 24 • 12 tweets • 5 min read

After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code):

ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!

Aug 22 • 14 tweets • 5 min read

Let's build an MCP server (100% local): Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀

Share this page!

Enter URL or ID to Unroll