Daily tutorials and insights on DS, ML, LLMs, and RAGs • Co-founder @dailydoseofds_ • IIT Varanasi • ex-AI Engineer @ MastercardAI
Jan 30 • 10 tweets • 3 min read
Let's build a Multimodal RAG with DeepSeek's latest Janus-Pro (100% local):
The video depicts a multimodal RAG running locally on your computer.
We use:
- Colpali to understand and embed docs using vision capabilities.
- @qdrant_engine as the vector database.
- @deepseek_ai's latest Janus-Pro multimodal LLM to generate a response.
Let's build it!
Jan 28 • 10 tweets • 3 min read
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:
Before we begin, here's what we'll be doing.
To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.
Let's begin!
Jan 27 • 11 tweets • 3 min read
5 LLM fine-tuning techniques, clearly explained (with visuals):
Traditional fine-tuning (depicted below) is infeasible with LLMs.
This is because LLMs have billions of parameters (and 100s of GBs in size).
Not everyone has access to such computing infrastructure, and it does not make business sense either.
Jan 26 • 7 tweets • 2 min read
95% of Jupyter Notebook users don't know these 5 AWESOME tricks:
1) Retrieve a cell’s output in Jupyter
If you often forget to assign the results of a Jupyter cell to a variable, you can use the `Out` dictionary to retrieve the output:
Jan 25 • 7 tweets • 2 min read
Let's build a mini-ChatGPT that's powered by DeepSeek-R1 (100% local):
Here's a mini-ChatGPT app that runs locally on your computer. You can chat with it just like you would chat with ChatGPT.
We use:
- @DeepSeek_AI R1 as the LLM
- @Ollama to locally serve R1
- @chainlit_io for the UI
Let's build it!
Jan 23 • 8 tweets • 3 min read
5 most popular Agentic AI design patterns, clearly explained (with visuals):
Agentic behaviors allow LLMs to refine their output by incorporating self-evaluation, planning, and collaboration!
The following visual depicts the 5 most popular design patterns employed in building AI agents.
Let's understand them below!
Jan 22 • 9 tweets • 3 min read
What is DeepSeek-R1 and how to use it, clearly explained:
For starters, DeepSeek AI released some open-weight reasoning models (like o1).
What's crazy is that it achieves a similar performance as OpenAI o1 but at much lower costs (~95% cheaper).
For instance, per 1M tokens:
• OpenAI o1: $60.00
• DeepSeek R1: $2.19 (95% cheaper).
Jan 21 • 10 tweets • 3 min read
5 chunking strategies for RAG, clearly explained (with visuals):
Additional document(s) can be pretty large.
Thus chunking involves dividing large documents into smaller/manageable pieces.
This ensures the text fits the input size of the embedding model, and it also improves the retrieval quality.
Jan 19 • 9 tweets • 3 min read
Let's build a multi-agent internet research assistant with OpenAI Swarm & Llama 3.2 (100% local):
Before we begin, here's what we're building!
The app takes a user query, searches the web for it, and turns it into a well-crafted article.
Tool stack:
- @ollama for running LLMs locally.
- @OpenAI Swarm for multi-agent orchestration.
- @Streamlit for the UI.
Jan 17 • 7 tweets • 2 min read
Traditional RAG vs. Agentic RAG, clearly explained (with visuals):
Traditional RAG has many issues:
- It retrieves once and generates once. If the context isn't enough, it cannot dynamically search for more info.
- It cannot reason through complex queries.
- The system can't modify its strategy based on the problem.
Jan 11 • 8 tweets • 2 min read
Bayes' Theorem, clearly explained:
Bayes' Theorem is a cornerstone of probability theory!
It calculates the probability of an event, given that another event has occurred.
It's like updating your guess with fresh information!
Before we delve into the details, let's take a quick look at its formula:
Jan 7 • 9 tweets • 3 min read
15 ways to optimize neural network training, clearly explained:
Before we dive in, this visual explains what we are discussing today.
Let's understand them now.
Jan 3 • 10 tweets • 3 min read
All-reduce and ring-reduce for multi-GPU training, clearly explained (with visuals):
Data parallelism:
• Replicates the model across all GPUs.
• Divides the data into smaller batches for every GPU.
• Computes the gradients on each GPU.
Since each GPU processes a different data chunk, the GPUs must be synchronized before the next iteration.
Dec 30, 2024 • 9 tweets • 3 min read
Active learning in ML, clearly explained (with visuals):
As the name suggests, the idea is to build the model with active human feedback on examples it is struggling with.
The visual below summarizes this:
Let’s get into the details.
Dec 26, 2024 • 7 tweets • 2 min read
Traditional RAG vs. HyDE, clearly explained (with visuals):
Questions are not semantically similar to their answers.
As a result, several irrelevant contexts get retrieved due to a higher cosine similarity.
Dec 24, 2024 • 9 tweets • 3 min read
What is Temperature in LLMs, clearly explained (with code demo):
Let's prompt OpenAI GPT-3.5.
A low temperate value produces identical responses from the LLM (shown below):
Dec 23, 2024 • 9 tweets • 3 min read
Tool calling in LLMs, clearly explained (with code):
When generating text, the LLM may need to invoke external tools or APIs to perform specific tasks beyond their built-in capabilities.
This is known as tool calling, and it turns the AI into more like a coordinator.
Dec 18, 2024 • 6 tweets • 1 min read
Prompting vs. RAG vs. Finetuning, which one is best for you, clearly explained:
When building LLM-based apps, it is unlikely you can start using the model right away without adjustments. To maintain high utility, you either need:
• Prompt engineering
• Fine-tuning
• RAG
• Or a hybrid approach (RAG + fine-tuning)
This visual will help you decide:
Mar 5, 2024 • 10 tweets • 3 min read
Sigmoid and softmax are not implemented the way most people think:
The most common way one would write a custom implementation for Sigmoid is as follows:
However, there is a big problem with this specific implementation, which is why most frameworks don’t implement it this way.