Avi Chawla Profile picture
Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
10 subscribers
Dec 12 14 tweets 4 min read
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML in recommendation
- Spotify uses graph ML in recommendation
- Pinterest uses graph ML in recommendation

Here are 6 must-know ways for graph feature engineering (with code): Like images, text, and tabular datasets have features, so do graph datasets.

This means when building models on graph datasets, we can engineer these features to achieve better performance.

Let's discuss some feature engineering techniques below! Image
Dec 10 10 tweets 4 min read
You're in an AI Engineer interview at OpenAI.

The interviewer asks:

"Our GPT model generates 100 tokens in 42 seconds.

How do you make it 5x faster?"

You: "I'll allocate more GPUs for faster generation."

Interview over.

Here's what you missed: The real bottleneck isn't compute, it's redundant computation.

Without KV caching, your model recalculates keys and values for each token, repeating work.

- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)

Let's dive in to understand how it works!
Dec 7 12 tweets 5 min read
You're in a Research Scientist interview at OpenAI.

The interviewer asks:

"How would you expand the context length of an LLM from 2K to 128K tokens?"

You: "I will fine-tune the model on longer docs with 128K context."

Interview over.

Here's what you missed: Extending the context window isn't just about larger matrices.

In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!

So, how do we manage it?

continue...👇 Image
Nov 25 10 tweets 4 min read
Context engineering, clearly explained (with visuals):

(an illustrated guide below) Image So, what is context engineering?

It’s the art and science of delivering the right information, in the right format, at the right time, to your LLM.

Here's a quote by Andrej Karpathy on context engineering...👇 Image
Oct 27 10 tweets 5 min read
8 key skills to master LLM Engineering:

(free/open-source resources below) Image 1️⃣ Prompt engineering

Prompt engineering is far from dead!

The key is to craft structured prompts that reduce ambiguity and result in deterministic outputs.

Treat it as engineering, not copywriting!

Here's something I published on JSON prompting: Image
Oct 24 12 tweets 4 min read
Let's build a reasoning LLM using GRPO, from scratch (100% local): Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀
Oct 5 11 tweets 4 min read
JSON prompting for LLMs, clearly explained: Today, let's understand what exactly JSON prompting is and how it can drastically improve your AI outputs!

The visual below gives a head-to-head comparison with traditional prompting.

Let's dive in!
Sep 19 8 tweets 4 min read
I've been coding in Python for 9 years now.

If I were to start over today, here's a complete roadmap: 1️⃣ Python bootcamp by @freeCodeCamp

4 hours Python bootcamp with over 46M views!! It covers:

- Installing Python
- Setting up an IDE
- Basic Syntax
- Variables & Datatypes
- Looping in Python
- Exception handling
- Modules & pip
- Mini hands-on projects

Check this out👇 Image
Sep 14 14 tweets 5 min read
Let's build a multi-agent brand monitoring system (100% local): Today, we're building a brand monitoring app to gain local-to-global insights from campaign and product feedback.

Tech stack:

- Bright Data to scrape data at scale
- @beam_cloud for deployment
- @crewAIInc for orchestration
- @ollama to serve LLM locally

Let's go!
Sep 12 17 tweets 6 min read
- All Meta Llama models use Attention
- All OpenAI GPT models use Attention
- All Alibaba Qwen models use Attention
- All Google Gemma models use Attention

Let's learn how to implement it from scratch: This is the paper that revolutionized AI!

Today, we'll implement:

- The complete Transformer architecture
- Multi-Head Attention mechanism
- Encoder-Decoder structure
- Positional Encoding

Everything in clean, educational Python code!

Let's go! Image
Sep 11 15 tweets 6 min read
Let's build a context engineering workflow, step by step: Today, we'll build a multi-agent research assistant using context engineering principles.

Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration

Let's go!
Sep 8 11 tweets 4 min read
I have been fine-tuning LLMs for over two years now!

Here are the top 5 LLM fine-tuning techniques, explained visually: Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.

Today, we’ll cover the top 5 PEFT techniques, step by step. Image
Sep 6 13 tweets 4 min read
Let's generate our own LLM fine-tuning dataset (100% local): Before we begin, here's what we're doing today!

We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?

Finally, we'll create our own instruction fine-tuning dataset.

Let's dive in! Image
Sep 4 11 tweets 4 min read
7 LLM generation parameters, clearly explained (with visuals): Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

The visual shows 7 parameters that matter most.

Let's understand them one by one! Image
Aug 28 9 tweets 3 min read
Temperature in LLMs, clearly explained (with code): Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇 Image
Aug 27 11 tweets 4 min read
There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local): To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!
Aug 25 12 tweets 4 min read
I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code): A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!
Aug 23 11 tweets 4 min read
The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs: In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇 Image
Aug 22 10 tweets 3 min read
You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths): Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths! Image
Aug 20 12 tweets 4 min read
DeepMind built a simple RAG technique that:

- reduces hallucinations by 40%
- improves answer relevancy by 50%

Let's understand how to use it in RAG systems (with code): Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:

- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration

Let's go!
Aug 17 11 tweets 4 min read
Model Context Protocol (MCP), clearly explained (with visuals): MCP is like a USB-C port for your AI applications.

Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.

Let's dive in! 🚀