Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Akshay 🚀

@akshay_pachaar

Feb 15 • 10 tweets • 3 min read • Read on X

Scrolly

Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:

Before we begin, here’s what we’ll be doing:

We’ll fine-tune our private and locally running DeepSeek-R1 (a distilled Llama variant).

Tech stack:

- @UnslothAI for efficient fine-tuning.
- @Ollama to run it locally.

Let’s go! 🚀

1️⃣ Load the model

We begin by loading the Distilled Llama-8B model and the tokenizer for DeepSeek-R1 using Unsloth:

2️⃣ Define LoRA Config

We must use efficient techniques like LoRA to avoid fine-tuning the entire model's weights.

In this code, we utilize Unsloth's PEFT by specifying:

- The model
- LoRA low-rank (r)
- Modules for fine-tuning
- A few more parameters

3️⃣ Prepare dataset

Next, we use the Alpaca dataset to prepare a conversation dataset.

The conversation_extension parameter defines the number of user messages in a single conversation.

4️⃣ Define Trainer

Here, we create a Trainer object by specifying the training config like learning rate, model, tokenizer, and more.

Check this out👇

5️⃣ Train

With that done, we initiate training. We notice a decreasing loss, which means the model is fine-tuning well.

Check this code and output👇

6️⃣ Export to Ollama

Finally, we export the model to Ollama as follows.

Done!

We have fine-tuned DeepSeek (distilled Llama).

Now we can interact with it like any other model running on Ollama using:

- The CLI
- Ollama's Python package
- Ollama's LlamaIndex integration, etc.

That's a wrap!

And, if you enjoyed this breakdown:

Find me → @akshay_pachaar ✔️

Everyday, I share insights and tutorials around AI and Machine Learning.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @akshay_pachaar

Akshay 🚀

@akshay_pachaar

Aug 1

Let's build a (Text2SQL + RAG), hybrid agentic workflow:

Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI to validate the response
- @OpenRouterAI to access the latest Qwen3

Let's go! 🚀

Here's how our app works:

- LLM processes the query to select a tool
- Converts the query into right format (text/SQL)
- Executes the tool and fetch the output
- Generates a response with enriched context
- Validates the response using Cleanlab's Codex

Now, let's see the code!

Read 14 tweets

Akshay 🚀

@akshay_pachaar

Jul 31

"Attention is all you need" implemented from scratch using PyTorch:

This is the paper that revolutionized AI!

Today, we'll implement:

- The complete Transformer architecture
- Multi-Head Attention mechanism
- Encoder-Decoder structure
- Positional Encoding

Everything in clean, educational Python code!

Let's go! 🚀

Here's the full Transformer model that we'll build piece by piece!

Notice the key components:

- Encoder & Decoder stacks
- Multi-head attention layers
- Position-wise feed-forward networks
- Positional encoding

Now let's break it down! 👇

Read 17 tweets

Akshay 🚀

@akshay_pachaar

Jul 27

I have been fine-tuning LLMs for more that 2 years now!

Here are the top 5 LLM fine-tuning techniques, explained with visuals:

Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.

Today, we’ll cover the top 5 PEFT techniques, step by step.

Some background!

LLM weights are matrices of numbers adjusted during finetuning.

Most PEFT techniques involve finding a lower-rank adaptation of these matrices—a smaller-dimensional matrix that can still represent the information stored in the original.

Read 11 tweets

Akshay 🚀

@akshay_pachaar

Jul 25

How LLMs train LLMs, clearly explained (with visuals):

LLMs learn not only from raw text but also from other models.

Google’s Gemma 2 and 3, for example, were distilled from the larger Gemini model.

Today we cover, the three most common knowledge‑distillation methods.

Let's dive in! 🚀

1️⃣ Soft-label Distillation

Generate token-level softmax probabilities over the entire corpus using:

- A frozen, pre-trained Teacher LLM
- An untrained Student LLM

Train the Student LLM to match the Teacher's probabilities.

Check this out👇

Read 10 tweets

Akshay 🚀

@akshay_pachaar

Jul 24

Let's build a "Chat with your Code" RAG app using Qwen3-Coder:

Before we begin, take a look at what we're about to create!

Tech stack:

- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI codex to validate the response
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.

Let's go! 🚀

The architecture diagram presented below illustrates some of the key components & how they interact with each other!

It will be followed by detailed descriptions & code for each component:

Read 13 tweets

Akshay 🚀

@akshay_pachaar

Jul 23

I just built the ultimate MCP server for Multimodal AI.

It lets you do RAG over audio, video, images and text!

100% open-source, here's the full breakdown...👇

Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @pixeltablehq to build the multi-modal AI infrastructure
- @crewAIInc to orchestrate the agentic workflow

Quickly check the thread, then return here for a detailed overview. 🚀

First of all, what is Pixeltable?

Pixeltable is a go-to Python library for Multimodal AI—streamlining entire pipeline from data storage to model execution.

Handles images, videos, text & audio effortlessly.

Our MCP servers are built on top of Pixeltable.

Read 15 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Akshay 🚀

Try unrolling a thread yourself!

More from @akshay_pachaar

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!