Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Akshay 🚀

@akshay_pachaar

Sep 2 • 13 tweets • 5 min read • Read on X

Scrolly

4 stages of training LLMs from scratch, clearly explained (with visuals):

Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!

0️⃣ Randomly initialized LLM

At this point, the model knows nothing.

You ask it “What is an LLM?” and get gibberish like “try peter hand and hello 448Sn”.

It hasn’t seen any data yet and possesses just random weights.

Check this 👇

1️⃣ Pre-training

This stage teaches the LLM the basics of language by training it on massive corpora to predict the next token. This way, it absorbs grammar, world facts, etc.

But it’s not good at conversation because when prompted, it just continues the text.

Check this 👇

2️⃣ Instruction fine-tuning

To make it conversational, we do Instruction Fine-tuning by training on instruction-response pairs. This helps it learn how to follow prompts and format replies.

Now it can:
- Answer questions
- Summarize content
- Write code, etc.

Check this 👇

At this point, we have likely:

- Utilized the entire raw internet archive and knowledge.
- The budget for human-labeled instruction response data.

So what can we do to further improve the model?

We enter into the territory of Reinforcement Learning (RL).

Let's learn next 👇

3️⃣ Preference fine-tuning (PFT)

You must have seen this screen on ChatGPT where it asks: Which response do you prefer?

That’s not just for feedback but it’s valuable human preference data.

OpenAI uses this to fine-tune their models using preference fine-tuning.

Check this 👇

In PFT:

The user chooses between 2 responses to produce human preference data.

A reward model is then trained to predict human preference and the LLM is updated using RL.

Check this 👇

The above process is called RLHF (Reinforcement Learning with Human Feedback) and the algorithm used to update model weights is called PPO.

It teaches the LLM to align with humans even when there’s no "correct" answer.

But we can improve the LLM even more.

Let's learn next👇

4️⃣ Reasoning fine-tuning

In reasoning tasks (maths, logic, etc.), there's usually just one correct response and a defined series of steps to obtain the answer.

So we don’t need human preferences, and we can use correctness as the signal.

This is called reasoning fine-tuning👇

Steps:

- The model generates an answer to a prompt.
- The answer is compared to the known correct answer.
- Based on the correctness, we assign a reward.

This is called Reinforcement Learning with Verifiable Rewards.

GRPO by DeepSeek is a popular technique.

Check this👇

Those were the 4 stages of training an LLM from scratch.

- Start with a randomly initialized model.
- Pre-train it on large-scale corpora.
- Use instruction fine-tuning to make it follow commands.
- Use preference & reasoning fine-tuning to sharpen responses.

Check this 👇

https://twitter.com/703601972/status/1962855614415331485

That's a wrap!

If you found it insightful, reshare with your network.

Find me → @akshay_pachaar ✔️
For more insights and tutorials on LLMs, AI Agents, and Machine Learning!

https://twitter.com/703601972/status/1962855614415331485

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @akshay_pachaar

Akshay 🚀

@akshay_pachaar

Sep 4

Let's build a reasoning LLM, from scratch (100% local):

Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀

What is GRPO?

Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.

Here's a brief overview of GRPO before we jump into code:

Read 12 tweets

Akshay 🚀

@akshay_pachaar

Aug 30

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Let's understand how you can use it in LLM apps (with code):

Today, we'll use the voyage-context-3 embedding model by @VoyageAI to do RAG over audio data.

We'll also use:
- @MongoDB Atlas Vector Search as vector DB
- @AssemblyAI for transcription
- @llama_index for orchestration
- gpt-oss as the LLM

Let's begin!

For context...

voyage-context-3 is a contextualized chunk embedding model that produces chunk embeddings with full document context.

This is unlike common chunk embedding models that embed chunks independently.

(We'll discuss the results later in the thread)

Check this👇

Read 14 tweets

Akshay 🚀

@akshay_pachaar

Aug 29

I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training:

Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!

These are some basic techniques:

1) Use efficient optimizers—AdamW, Adam, etc.

2) Utilize hardware accelerators (GPUs/TPUs).

3) Max out the batch size.

4) Use multi-GPU training through Model/Data/Pipeline/Tensor parallelism. Check the visual👇

Read 11 tweets

Akshay 🚀

@akshay_pachaar

Aug 26

I boosted my AI Agent's performance by 184%

Using a fully open-source, automatic technique

Here's a breakdown (with code):

Top AI Engineers never do manual prompt engineering.

Today, I'll show you how to automatically find the best prompts for any agentic workflow you're building.

We'll use @Cometml's 100% open-source Opik to do so.

Let's go! 🚀

The idea is simple yet powerful:

1. Start with an initial prompt & eval dataset
2. Let the optimizer iteratively improve the prompt
3. Get the optimal prompt automatically! ✨

Now let's dive into the code for this!

Read 13 tweets

Akshay 🚀

@akshay_pachaar

Aug 24

After MCP, A2A, & AG-UI, there's another Agent protocol.

It's fully open-source and launched by IBM Research.

Here's a complete breakdown (with code):

ACP is a standardized, RESTful interface for Agents to discover and coordinate with other Agents, regardless of their framework.

Just like A2A, it lets Agents communicate with Agents. There are some differences, which we shall discuss later.

Let's dive into the code first!

Here's how it works:

- Build the Agents and host them on ACP servers.
- The ACP server receives requests from the ACP Client and forwards them to the Agent.
- ACP Client itself can be an Agent to intelligently route requests to the Agents (like MCP Client does).

Check this 👇

Read 12 tweets

Akshay 🚀

@akshay_pachaar

Aug 22

Let's build an MCP server (100% local):

Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a 100% local and secure MCP client using @mcpuse
- Integrate the client with @Stagehanddev MCP sever
- Use this setup for control and automate browser

Let's go! 🚀

First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Akshay 🚀

Try unrolling a thread yourself!

More from @akshay_pachaar

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!