Akshay 🚀 Profile picture
Apr 19, 2024 11 tweets 4 min read Read on X
Let's build a RAG app using MetaAI's Llama-3 (100% local):
Before we begin, take a look at what we're about to create!

Here's what you'll learn:

- @Ollama for locally serving a LLM (Llama-3)
- @Llama_Index for orchestration
- @Streamlit for building the UI
- @LightningAI for development & hosting

Let's go! 🚀
The architecture diagram presented below illustrates some of the key components & how they interact with each other!

It will be followed by detailed descriptions & code for each component: Image
1️⃣ & 2️⃣ : Loading the knowledge base

A knowledge base is a collection of relevant and up-to-date information that serves as a foundation for RAG. In our case it's the docs stored in a directory.

Here's how you can load it as document objects in LlamaIndex: Image
3️⃣ The embedding model

Embedding is a meaningful representation of text in form of numbers.

The embedding model is responsible for creating embeddings for the document chunks & user queries.

We are using @SnowflakeDB's `arctic-embed-m`, one of the best model in it's class. Image
4️⃣ Indexing & storing

Embeddings created by embedding model are stored in a vector store that offers fast retrieval and similarity search by creating an index over our data.

By default, LlamaIndex provides a in-memory vector store that’s great for quick experimentation. Image
5️⃣ Creating a prompt template

A custom prompt template is use to refine the response from LLM & include the context as well: Image
6️⃣ & 7️⃣ Setting up a query engine

The query engine takes a query string & use it to fetch relevant context and then sends them both as a prompt to the LLM to generate a final natural language response.

Here's how you set it up: Image
8️⃣ The Chat interface

We create a UI using Streamlit to provide a chat interface for our RAG application.

The code for this & all we discussed so far is shared in the next tweet!

Check this out👇 Image
I used @LightningAI⚡️ Studio for developing this application!

You will find all the code & everything you need to run it! ✨

Clone a FREE studio now & take it for a spin...👇
lightning.ai/lightning-ai/s…
If you interested in:

- Python 🐍
- Machine Learning 🤖
- AI Engineering ⚙️

Find me → @akshay_pachaar ✔️
Everyday, I share tutorials on above topics!

Cheers! 🥂

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Dec 18, 2025
Turn any Autoregressive LLM into a Diffusion LM.

dLLM is a Python library that unifies the training & evaluation of diffusion language models.

You can also use it to turn ANY autoregressive LM into a diffusion LM with minimal compute.

100% open-source.
Here's why this matters:

Traditional autoregressive models generate text left-to-right, one token at a time. Diffusion models work differently - they refine the entire sequence iteratively, giving you better control over generation quality and more flexible editing capabilities.
dLLM GitHub:

(don't forget to star 🌟)github.com/ZHZisZZ/dllm
Read 4 tweets
Dec 6, 2025
You're in a Research Scientist interview at Google.

Interviewer: We have a base LLM that's terrible at maths. How would you turn it into a maths & reasoning powerhouse?

You: I'll get some problems labeled and fine-tune the model.

Interview over.

Here's what you missed:
When outputs are verifiable, labels become optional.

Maths, code, and logic can be automatically checked and validated.

Let's use this fact to build a reasoning model without manual labelling.

We'll use:

- @UnslothAI for parameter-efficient finetuning.
- @HuggingFace TRL to apply GRPO.

Let's go! 🚀
What is GRPO?

Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.

Here's a brief overview of GRPO before we jump into code:
Read 11 tweets
Dec 5, 2025
I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training:

(detailed explanation ...🧵)
First, lets look at some basic techniques:

1) Use efficient optimizers—AdamW, Adam, etc.

2) Utilize hardware accelerators (GPUs/TPUs).

3) Max out the batch size.

4) Use multi-GPU training through Model/Data/Pipeline/Tensor parallelism.

Check the visual👇
5) Bayesian optimization for hyperparameter optimization:

This technique takes informed steps based on the results of the previous hyperparameter configs.

This way, the model converges to an optimal set of hyperparameters much faster.

Check these results 👇 Image
Read 9 tweets
Nov 23, 2025
You’re in an ML Engineer interview at Google.

Interviewer: We need to train an LLM across 1,000 GPUs. How would you make sure all GPUs share what they learn?

You: Use a central parameter server to aggregate and redistribute the weights.

Interview over.

Here’s what you missed:
One major run-time bottleneck in multi-GPU training happens during GPU synchronization.

For instance, in multi-GPU training via data parallelism:

- The same model is distributed to different GPUs.
- Each GPU processes a different subset of the whole dataset.

Check this 👇
This leads to different gradients across different devices.

So, before updating the model parameters on each GPU device, we must communicate the gradients to all other devices to sync them.

Let’s understand 2 common strategies next!
Read 14 tweets
Nov 21, 2025
NOBODY wants to send their data to Google or OpenAI.

Yet here we are, shipping proprietary code, customer information, and sensitive business logic to closed-source APIs we don't control.

While everyone's chasing the latest closed-source releases, open-source models are quietly becoming the practical choice for many production systems.

Here's what everyone is missing:

Open-source models are catching up fast, and they bring something the big labs can't: privacy, speed, and control.

I built a playground to test this myself. Used CometML's Opik to evaluate models on real code generation tasks - testing correctness, readability, and best practices against actual GitHub repos.

Here's what surprised me:

OSS models like MiniMax-M2, Kimi k2 performed on par with the likes of Gemini 3 and Claude Sonnet 4.5 on most tasks.

But practically MiniMax-M2 turns out to be a winner as it's twice as fast and 12x cheaper when you compare it to models like Sonnet 4.5.

Well, this isn't just about saving money.

When your model is smaller and faster, you can deploy it in places closed-source APIs can't reach:

↳ Real-time applications that need sub-second responses
↳ Edge devices where latency kills user experience
↳ On-premise systems where data never leaves your infrastructure

MiniMax-M2 runs with only 10B activated parameters. That efficiency means lower latency, higher throughput, and the ability to handle interactive agents without breaking the bank.

The intelligence-to-cost ratio here changes what's possible.

You're not choosing between quality and affordability anymore. You're not sacrificing privacy for performance. The gap is closing, and in many cases, it's already closed.

If you're building anything that needs to be fast, private, or deployed at scale, it's worth taking a look at what's now available.

MiniMax-M2 is 100% open-source, free for developers right now. I have shared the link to their GitHub repo in the next tweet.

You will also find the code for the playground and evaluations I've done.
@MiniMax__AI GitHub repo for M2:

(don't forget to star 🌟)
github.com/MiniMax-AI/Min…
@MiniMax__AI Find the code for the playground and the evaluation done using @Cometml Opik: github.com/patchy631/ai-e…
Read 4 tweets
Oct 27, 2025
Claude Skills might be the biggest upgrade to AI agents so far!

Some say it's even bigger than MCP.

I've been testing skills for the past 3-4 days, and they're solving a problem most people don't talk about: agents just keep forgetting everything.

In this video, I'll share everything I've learned so far.

It covers:

> The core idea (skills as SOPs for agents)
> Anatomy of a skill
> Skills vs. MCP vs. Projects vs. Subagents
> Building your own skill
> Hands-on example

Skills are the early signs of continual learning, and they can change how we work with agents forever!

Here's everything you need to know:
Skills vs. Projects vs. Subagents: Image
If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(