Latest Twitter Threads by @_avichawla on Thread Reader App

Nov 25 • 10 tweets • 4 min read

Context engineering, clearly explained (with visuals):

(an illustrated guide below)

So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇

Oct 24 • 12 tweets • 4 min read

Let's build a reasoning LLM using GRPO, from scratch (100% local): Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀

Oct 5 • 11 tweets • 4 min read

JSON prompting for LLMs, clearly explained: Today, let's understand what exactly JSON prompting is and how it can drastically improve your AI outputs!

The visual below gives a head-to-head comparison with traditional prompting.

Let's dive in!

Sep 19 • 8 tweets • 4 min read

I've been coding in Python for 9 years now.

If I were to start over today, here's a complete roadmap: 1️⃣ Python bootcamp by @freeCodeCamp

4 hours Python bootcamp with over 46M views!! It covers:

- Installing Python
- Setting up an IDE
- Basic Syntax
- Variables & Datatypes
- Looping in Python
- Exception handling
- Modules & pip
- Mini hands-on projects

Check this out👇

Sep 11 • 15 tweets • 6 min read

Let's build a context engineering workflow, step by step: Today, we'll build a multi-agent research assistant using context engineering principles.

Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration

Let's go!

Sep 8 • 11 tweets • 4 min read

I have been fine-tuning LLMs for over two years now!

Here are the top 5 LLM fine-tuning techniques, explained visually: Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.

Today, we’ll cover the top 5 PEFT techniques, step by step.

Sep 6 • 13 tweets • 4 min read

Let's generate our own LLM fine-tuning dataset (100% local): Before we begin, here's what we're doing today!

We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?

Finally, we'll create our own instruction fine-tuning dataset.

Let's dive in!

Sep 4 • 11 tweets • 4 min read

7 LLM generation parameters, clearly explained (with visuals): Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

The visual shows 7 parameters that matter most.

Let's understand them one by one!

Aug 28 • 9 tweets • 3 min read

Temperature in LLMs, clearly explained (with code): Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇

Aug 27 • 11 tweets • 4 min read

There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local): To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!

Aug 25 • 12 tweets • 4 min read

I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code): A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!

Aug 23 • 11 tweets • 4 min read

The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs: In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇

Aug 22 • 10 tweets • 3 min read

You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths): Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths!

Aug 20 • 12 tweets • 4 min read

DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code): Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!

Aug 17 • 11 tweets • 4 min read

Model Context Protocol (MCP), clearly explained (with visuals): MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀

Aug 14 • 12 tweets • 4 min read

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Here's a complete breakdown (with visuals): RAG is 80% retrieval and 20% generation.

So if RAG isn't working, most likely, it's a retrieval issue, which further originates from chunking and embedding.

Contextualized chunk embedding models solve this.

Let's dive in to learn more!

Aug 11 • 10 tweets • 3 min read

Let's fine-tune OpenAI gpt-oss (100% locally): Today, let's learn how to fine-tune OpenAI's latest gpt-oss locally.

We'll give it multilingual reasoning capabilities as shown in the video.

We'll use:
- @UnslothAI for efficient fine-tuning.
- @huggingface transformers to run it locally.

Let's begin!

Aug 8 • 12 tweets • 4 min read

Enterprises build RAG over 100s of data sources, not one!

- Microsoft ships it in M365 products.
- Google ships it in its Vertex AI Search.
- AWS ships it in its Amazon Q Business.

Let's build an MCP-powered RAG over 200+ sources (100% local): Enterprise data is scattered across many sources.

Today, we'll build a unified MCP server that can query 200+ sources from one interface.

Tech stack:
- @mcpuse to build a local MCP client
- @MindsDB to connect to data sources
- @ollama to serve GPT-oss locally

Let's begin!

Aug 7 • 11 tweets • 4 min read

I have been building AI Agents in production for over an year.

If you want to learn too, here's a simple tutorial (hands-on): Today, we'll build and deploy a Coding Agent that can scrape docs, write production-ready code, solve issues and raise PRs, directly from Slack.

Tech stack:
- Claude Code for code generation
- @xpander_ai as the Agent backend
- @firecrawl_dev for scraping

Let's begin!

Aug 6 • 14 tweets • 7 min read

12 MCP, RAG, and Agents cheat sheets for AI engineers (with visuals): 1️⃣ Function calling & MCP for LLMs

Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs.

The visual covers how Function Calling & MCP work under the hood.

Check the thread below 👇

https://twitter.com/1175166450832687104/status/1913842571572625420

Aug 4 • 14 tweets • 5 min read

A simple technique makes RAG ~32x memory efficient!

- Perplexity uses it in its search index
- Azure uses it in its search pipeline
- HubSpot uses it in its AI assistant

Let's understand how to use it in RAG systems (with code): Today, let's build a RAG system that queries 36M+ vectors in <30ms using Binary Quantization.

Tech stack:
- @llama_index for orchestration
- @milvusio as the vector DB
- @beam_cloud for serverless deployment
- @Kimi_Moonshot Kimi-K2 as the LLM hosted on Groq

Let's build it!

Share this page!

Enter URL or ID to Unroll