Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
10 subscribers
Dec 12 • 14 tweets • 4 min read
- Google Maps uses graph ML to predict ETA
- Netflix uses graph ML in recommendation
- Spotify uses graph ML in recommendation
- Pinterest uses graph ML in recommendation
Here are 6 must-know ways for graph feature engineering (with code):
Like images, text, and tabular datasets have features, so do graph datasets.
This means when building models on graph datasets, we can engineer these features to achieve better performance.
Let's discuss some feature engineering techniques below!
Dec 10 • 10 tweets • 4 min read
You're in an AI Engineer interview at OpenAI.
The interviewer asks:
"Our GPT model generates 100 tokens in 42 seconds.
How do you make it 5x faster?"
You: "I'll allocate more GPUs for faster generation."
Interview over.
Here's what you missed:
The real bottleneck isn't compute, it's redundant computation.
Without KV caching, your model recalculates keys and values for each token, repeating work.
- with KV caching → 9 seconds
- without KV caching → 42 seconds (~5x slower)
Let's dive in to understand how it works!
Dec 7 • 12 tweets • 5 min read
You're in a Research Scientist interview at OpenAI.
The interviewer asks:
"How would you expand the context length of an LLM from 2K to 128K tokens?"
You: "I will fine-tune the model on longer docs with 128K context."
Interview over.
Here's what you missed:
Extending the context window isn't just about larger matrices.
In a traditional transformer, expanding tokens by 8x increases memory needs by 64x due to the quadratic complexity of attention. Refer to the image below!
Let's build a reasoning LLM using GRPO, from scratch (100% local):
Today, we're going to learn how to turn any model into a reasoning powerhouse.
We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!
Tech stack:
- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO
Let's go! 🚀
Oct 5 • 11 tweets • 4 min read
JSON prompting for LLMs, clearly explained:
Today, let's understand what exactly JSON prompting is and how it can drastically improve your AI outputs!
The visual below gives a head-to-head comparison with traditional prompting.
Let's dive in!
Sep 19 • 8 tweets • 4 min read
I've been coding in Python for 9 years now.
If I were to start over today, here's a complete roadmap:
1️⃣ Python bootcamp by @freeCodeCamp
4 hours Python bootcamp with over 46M views!! It covers:
- Installing Python
- Setting up an IDE
- Basic Syntax
- Variables & Datatypes
- Looping in Python
- Exception handling
- Modules & pip
- Mini hands-on projects
Check this out👇
Sep 14 • 14 tweets • 5 min read
Let's build a multi-agent brand monitoring system (100% local):
Today, we're building a brand monitoring app to gain local-to-global insights from campaign and product feedback.
Tech stack:
- Bright Data to scrape data at scale
- @beam_cloud for deployment
- @crewAIInc for orchestration
- @ollama to serve LLM locally
Let's go!
Sep 12 • 17 tweets • 6 min read
- All Meta Llama models use Attention
- All OpenAI GPT models use Attention
- All Alibaba Qwen models use Attention
- All Google Gemma models use Attention
Let's learn how to implement it from scratch:
This is the paper that revolutionized AI!
Let's build a context engineering workflow, step by step:
Today, we'll build a multi-agent research assistant using context engineering principles.
Tech stack:
- @tensorlake to get RAG-ready data from complex docs
- @zep_ai for memory
- @firecrawl_dev for web search
- @milvusio for vector DB
- @crewAIInc for orchestration
Let's go!
Sep 8 • 11 tweets • 4 min read
I have been fine-tuning LLMs for over two years now!
Here are the top 5 LLM fine-tuning techniques, explained visually:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).
Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) is extensively used.
Today, we’ll cover the top 5 PEFT techniques, step by step.
Sep 6 • 13 tweets • 4 min read
Let's generate our own LLM fine-tuning dataset (100% local):
Before we begin, here's what we're doing today!
We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?
Finally, we'll create our own instruction fine-tuning dataset.
Let's dive in!
Sep 4 • 11 tweets • 4 min read
7 LLM generation parameters, clearly explained (with visuals):
Every generation from an LLM is shaped by parameters under the hood.
Knowing how to tune is important so that you can produce sharp and more controlled outputs.
The visual shows 7 parameters that matter most.
Let's understand them one by one!
Aug 28 • 9 tweets • 3 min read
Temperature in LLMs, clearly explained (with code):
Let's prompt OpenAI GPT-3.5 with a low temperature value twice.
It produces identical responses from the LLM.
Check the response below👇
Aug 27 • 11 tweets • 4 min read
There's a new way to build production-grade MCP servers.
- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.
Here's a step-by-step breakdown (100% local):
To build MCP servers from scratch with custom tools, one has to:
- read the API docs
- implement MCP tools
- test them, and much more
Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).
Let's dive in!
Aug 25 • 12 tweets • 4 min read
I removed 74% of neurons from a neural network.
It dropped the accuracy by just 0.50%.
Here's a breakdown (with code):
A trained neural network always has neurons that do not substantially contribute to the performance.
But they still consume memory.
These can be removed without significantly compromising accuracy.
Let's understand how they extend the context length of LLMs:
In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.
Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.
Check this 👇
Aug 22 • 10 tweets • 3 min read
You are in an ML interview.
Your interviewer asks: "Why is Kernel Trick called a Trick?"
Here's how to answer (with simple maths):
Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.
If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.
Let's understand more with maths!
Aug 20 • 12 tweets • 4 min read
DeepMind built a simple RAG technique that:
- reduces hallucinations by 40%
- improves answer relevancy by 50%
Let's understand how to use it in RAG systems (with code):
Most RAG apps fail due to retrieval. Today, we'll build a RAG system that self-corrects inaccurate retrievals using:
- @firecrawl_dev for scraping
- @milvusio as vectorDB
- @beam_cloud for deployment
- @Cometml Opik for observability
- @Llama_Index for orchestration
Let's go!
Aug 17 • 11 tweets • 4 min read
Model Context Protocol (MCP), clearly explained (with visuals):
MCP is like a USB-C port for your AI applications.
Just as USB-C offers a standardized way to connect devices to various accessories, MCP standardizes how your AI apps connect to different data sources and tools.