Akshay 🚀 Profile picture
Feb 15 10 tweets 3 min read Read on X
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:
Before we begin, here’s what we’ll be doing:

We’ll fine-tune our private and locally running DeepSeek-R1 (a distilled Llama variant).

Tech stack:

- @UnslothAI for efficient fine-tuning.
- @Ollama to run it locally.

Let’s go! 🚀
1️⃣ Load the model

We begin by loading the Distilled Llama-8B model and the tokenizer for DeepSeek-R1 using Unsloth: Image
2️⃣ Define LoRA Config

We must use efficient techniques like LoRA to avoid fine-tuning the entire model's weights.

In this code, we utilize Unsloth's PEFT by specifying:

- The model
- LoRA low-rank (r)
- Modules for fine-tuning
- A few more parameters Image
3️⃣ Prepare dataset

Next, we use the Alpaca dataset to prepare a conversation dataset.

The conversation_extension parameter defines the number of user messages in a single conversation. Image
4️⃣ Define Trainer

Here, we create a Trainer object by specifying the training config like learning rate, model, tokenizer, and more.

Check this out👇 Image
5️⃣ Train

With that done, we initiate training. We notice a decreasing loss, which means the model is fine-tuning well.

Check this code and output👇 Image
6️⃣ Export to Ollama

Finally, we export the model to Ollama as follows.

Done! Image
We have fine-tuned DeepSeek (distilled Llama).

Now we can interact with it like any other model running on Ollama using:

- The CLI
- Ollama's Python package
- Ollama's LlamaIndex integration, etc. Image
That's a wrap!

And, if you enjoyed this breakdown:

Find me → @akshay_pachaar ✔️

Everyday, I share insights and tutorials around AI and Machine Learning.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Akshay 🚀

Akshay 🚀 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @akshay_pachaar

Aug 1
Let's build a (Text2SQL + RAG), hybrid agentic workflow:
Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI to validate the response
- @OpenRouterAI to access the latest Qwen3

Let's go! 🚀
Here's how our app works:

- LLM processes the query to select a tool
- Converts the query into right format (text/SQL)
- Executes the tool and fetch the output
- Generates a response with enriched context
- Validates the response using Cleanlab's Codex

Now, let's see the code!
Read 14 tweets
Jul 31
"Attention is all you need" implemented from scratch using PyTorch:
This is the paper that revolutionized AI!

Today, we'll implement:

- The complete Transformer architecture
- Multi-Head Attention mechanism
- Encoder-Decoder structure
- Positional Encoding

Everything in clean, educational Python code!

Let's go! 🚀 Image
Here's the full Transformer model that we'll build piece by piece!

Notice the key components:

- Encoder & Decoder stacks
- Multi-head attention layers
- Position-wise feed-forward networks
- Positional encoding

Now let's break it down! 👇 Image
Read 17 tweets
Jul 27
I have been fine-tuning LLMs for more that 2 years now!

Here are the top 5 LLM fine-tuning techniques, explained with visuals:
Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of computing isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.

Today, we’ll cover the top 5 PEFT techniques, step by step. Image
Some background!

LLM weights are matrices of numbers adjusted during finetuning.

Most PEFT techniques involve finding a lower-rank adaptation of these matrices—a smaller-dimensional matrix that can still represent the information stored in the original. Image
Read 11 tweets
Jul 25
How LLMs train LLMs, clearly explained (with visuals):
LLMs learn not only from raw text but also from other models.

Google’s Gemma 2 and 3, for example, were distilled from the larger Gemini model.

Today we cover, the three most common knowledge‑distillation methods.

Let's dive in! 🚀
1️⃣ Soft-label Distillation

Generate token-level softmax probabilities over the entire corpus using:

- A frozen, pre-trained Teacher LLM
- An untrained Student LLM

Train the Student LLM to match the Teacher's probabilities.

Check this out👇
Read 10 tweets
Jul 24
Let's build a "Chat with your Code" RAG app using Qwen3-Coder:
Before we begin, take a look at what we're about to create!

Tech stack:

- @Llama_Index for orchestration
- @Milvusio to self-host a vectorDB
- @CleanlabAI codex to validate the response
- @OpenRouterAI to access @Alibaba_Qwen 3 Coder.

Let's go! 🚀
The architecture diagram presented below illustrates some of the key components & how they interact with each other!

It will be followed by detailed descriptions & code for each component:
Read 13 tweets
Jul 23
I just built the ultimate MCP server for Multimodal AI.

It lets you do RAG over audio, video, images and text!

100% open-source, here's the full breakdown...👇
Before we dive in, here's a quick demo of what we're building!

Tech stack:

- @pixeltablehq to build the multi-modal AI infrastructure
- @crewAIInc to orchestrate the agentic workflow

Quickly check the thread, then return here for a detailed overview. 🚀
First of all, what is Pixeltable?

Pixeltable is a go-to Python library for Multimodal AI—streamlining entire pipeline from data storage to model execution.

Handles images, videos, text & audio effortlessly.

Our MCP servers are built on top of Pixeltable.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(