Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Sep 6 • 13 tweets • 4 min read • Read on X

Scrolly

Let's generate our own LLM fine-tuning dataset (100% local):

Before we begin, here's what we're doing today!

We'll cover:
- What is instruction fine-tuning?
- Why is it important for LLMs?

Finally, we'll create our own instruction fine-tuning dataset.

Let's dive in!

Once an LLM has been pre-trained, it simply continues the sentence as if it is one long text in a book or an article.

For instance, check this to understand how a pre-trained LLM behaves when prompted 👇

Generating a synthetic dataset using existing LLMs and utilizing it for fine-tuning can improve this.

The synthetic data will have fabricated examples of human-AI interactions.

Check this sample👇

This process is called instruction fine-tuning.

Distilabel is an open-source framework that facilitates generating domain-specific synthetic text data using LLMs.

Check this to understand the underlying process👇

Next, let's look at the code.

First, we start with some standard imports.

Check this👇

Moving on, we load the Llama-3 models locally with Ollama.

Here's how we do it👇

Next, we define our pipeline:

- Load dataset.
- Generate two responses.
- Combine the responses into one column.
- Evaluate the responses with an LLM.
- Define and run the pipeline.

Check this👇

Once the pipeline has been defined, we need to execute it by giving it a seed dataset.

The seed dataset helps it generate new but similar samples.

Check this code👇

Done!

This produces the instruction and response synthetic dataset as desired.

Check the sample below👇

Here's the instruction fine-tuning process again for your reference.

- Generate responses from two LLMs.
- Rank the response using another LLM.
- Pick the best-rated response and pair it with the instruction.

Check this👇

https://twitter.com/1175166450832687104/status/1947184607277019582

For further reading, I covered the 4 stages of training LLMs from scratch in the thread below.

This visual summarizes what I covered👇

https://twitter.com/1175166450832687104/status/1947184607277019582

https://twitter.com/1175166450832687104/status/1964215272941957367

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1964215272941957367

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Sep 4

7 LLM generation parameters, clearly explained (with visuals):

Every generation from an LLM is shaped by parameters under the hood.

Knowing how to tune is important so that you can produce sharp and more controlled outputs.

The visual shows 7 parameters that matter most.

Let's understand them one by one!

1️⃣ Max tokens

This is a hard cap on how many tokens the model can generate in one response.

- Too low → truncated outputs
- Too high → could lead to wasted compute.

Check this 👇

Read 11 tweets

Avi Chawla

@_avichawla

Aug 28

Temperature in LLMs, clearly explained (with code):

Let's prompt OpenAI GPT-3.5 with a low temperature value twice.

It produces identical responses from the LLM.

Check the response below👇

Now, let's prompt it with a high temperature value.

This time, it produces gibberish output. Check the output below.

What is going on here? Let's dive in!

Read 9 tweets

Avi Chawla

@_avichawla

Aug 27

There's a new way to build production-grade MCP servers.

- It takes less than a minute.
- You don't have to write any code.
- You can integrate from 100k+ tools.

Here's a step-by-step breakdown (100% local):

To build MCP servers from scratch with custom tools, one has to:

- read the API docs
- implement MCP tools
- test them, and much more

Today, let's learn how to simplify this and build production-grade MCP servers using Postman's MCP Generator (free to use).

Let's dive in!

For context...

Postman's MCP Generator lets us build an MCP server with tools from its public API Network (with 100k+ APIs).

Steps:

- Select all the APIs for your MCP server.
- Export the code for the MCP server.
- Integrate it with any MCP client.

Check this👇

Read 11 tweets

Avi Chawla

@_avichawla

Aug 25

I removed 74% of neurons from a neural network.

It dropped the accuracy by just 0.50%.

Here's a breakdown (with code):

A trained neural network always has neurons that do not substantially contribute to the performance.

But they still consume memory.

These can be removed without significantly compromising accuracy.

Let's see how to identify them!

Here are the steps:

Step 1) Train the neural network as usual.

Step 2) Pass the validation set through the trained network, and for every neuron in hidden layers, compute:

- The average activation
- The variance of activations (if activations can be -ve)

Check this👇

Read 12 tweets

Avi Chawla

@_avichawla

Aug 23

The growth of LLM context length with time:

- GPT-3.5-turbo → 4k tokens
- OpenAI GPT4 → 8k tokens
- Claude 2 → 100k tokens
- Llama 3 → 128k tokens
- Gemini → 1M tokens

Let's understand how they extend the context length of LLMs:

In a traditional transformer, a model processing "8x" tokens requires 64 times more computation (quadratic growth) than one handling "x" tokens.

Thus, having a longer context window isn't just as easy as increasing the size of the matrices, if you will.

Check this 👇

1) Sparse Attention

It limits the attention computation to a subset of tokens by:

- Using local attention (tokens attend only to their neighbors).
- Letting the model learn which tokens to focus on.

But this has a trade-off between computational complexity and performance.

Read 11 tweets

Avi Chawla

@_avichawla

Aug 22

You are in an ML interview.

Your interviewer asks: "Why is Kernel Trick called a Trick?"

Here's how to answer (with simple maths):

Many ML algorithms use kernels for robust modeling, like SVM and KernelPCA.

If we have two n-dimensional vectors, a kernel function lets us compute their dot product in m-dimensional space (m>>n) without explicitly projecting the vectors.

Let's understand more with maths!

Let’s assume the following polynomial kernel function:

Also, for simplicity, let’s say both X and Y are two-dimensional vectors:

- X = (x1, x2)
- Y = (y1, y2)

Check this 👇

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!