Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Akshay 🚀

Jun 9, 2023 • 9 tweets • 4 min read • Read on X

Scrolly

LLMs are everywhere but do you know how they generate text❓

Let's take the magic out of it and break things down to first principles!

Today I'll explain what is conditional probability and how it is related to LLMs!

A Thread 🧵👇

Before diving into LLMs, lets understand conditional probability.

We consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇

So what is Conditional probability ⁉️

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration 👇

For instance, if we're predicting whether it will rain today (event A), knowing that it's cloudy (event B) might impact our prediction.

As it's more likely to rain when it's cloudy, we'd say the conditional probability P(A|B) is high.

That's conditional probability for you! 🎉

Now, how does this apply to LLMs like GPT-4❓

These models are tasked with predicting the next word in a sequence.

This is a question of conditional probability: given the words that have come before, what is the most likely next word?

To predict the next word, the model calculates the conditional probability for each possible next word, given the previous words (context).

The word with the highest conditional probability is chosen as the prediction.

The LLM learns a high-dimensional probability distribution over sequences of words.

And the parameters of this distribution are the trained weights!

The training or rather pre-training** is supervised.

I'll talk about the different training steps next time!**

Check this 👇

@LightningAI

Hopefully, this thread has demystified a bit of the magic behind LLMs and the concept of conditional probability.

if you want to learn more about building with LLMs, @LightningAI has some top resources on the same!

Check this out👇
lightning.ai/pages/blog/

@akshay_pachaar

That's a wrap!

If you interested in:

- Python 🐍
- Data Science 📈
- Machine Learning 🤖
- MLOps 🛠
- NLP 🗣
- Computer Vision 🎥
- LLMs 🧠

I'm sharing daily content over here, follow me → @akshay_pachaar if you haven't already!!

Cheers! 🥂

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @akshay_pachaar

Akshay 🚀

@akshay_pachaar

Sep 7

8 key skills to become a full-stack AI Engineer:

Production-grade AI systems demand deep understanding of how LLMs are engineered, deployed, and optimized.

Here are the 8 pillars that define serious LLM development:

Let's dive in! 🚀

https://twitter.com/703601972/status/1957784212112830798

1️⃣ Prompt engineering

Prompt engineering is far from dead!

The key is to craft structured prompts that reduce ambiguity and result in deterministic outputs.

Treat it as engineering, not copywriting! ⚙️

Here's something I published on JSON prompting:

https://twitter.com/703601972/status/1957784212112830798

Read 12 tweets

Akshay 🚀

@akshay_pachaar

Sep 6

K-Means has two major problems:

- The number of clusters must be known
- It doesn't handle outliers

Here’s an algorithm that addresses both issues:

Introducing DBSCAN, a density-based clustering algorithm.

Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.

It's very easy to understand, just follow along ...👇

DBSCAN has two important parameters.

1️⃣ Epsilon (eps):

`eps`: represents the maximum distance between two points for them to be considered part of the same cluster.

Points within this distance of each other are considered to be neighbours.

Check this out 👇

Read 9 tweets

Akshay 🚀

@akshay_pachaar

Sep 4

Let's build a reasoning LLM, from scratch (100% local):

Today, we're going to learn how to turn any model into a reasoning powerhouse.

We'll do so without any labeled data or human intervention, using Reinforcement Finetuning (GRPO)!

Tech stack:

- @UnslothAI for efficient fine-tuning
- @HuggingFace TRL to apply GRPO

Let's go! 🚀

What is GRPO?

Group Relative Policy Optimization is a reinforcement learning method that fine-tunes LLMs for math and reasoning tasks using deterministic reward functions, eliminating the need for labeled data.

Here's a brief overview of GRPO before we jump into code:

Read 12 tweets

Akshay 🚀

@akshay_pachaar

Sep 2

4 stages of training LLMs from scratch, clearly explained (with visuals):

Today, we are covering the 4 stages of building LLMs from scratch to make them applicable for real-world use cases.

We'll cover:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning

The visual summarizes these techniques.

Let's dive in!

0️⃣ Randomly initialized LLM

At this point, the model knows nothing.

You ask it “What is an LLM?” and get gibberish like “try peter hand and hello 448Sn”.

It hasn’t seen any data yet and possesses just random weights.

Check this 👇

Read 13 tweets

Akshay 🚀

@akshay_pachaar

Aug 30

A new embedding model cuts vector DB costs by ~200x.

It also outperforms OpenAI and Cohere models.

Let's understand how you can use it in LLM apps (with code):

Today, we'll use the voyage-context-3 embedding model by @VoyageAI to do RAG over audio data.

We'll also use:
- @MongoDB Atlas Vector Search as vector DB
- @AssemblyAI for transcription
- @llama_index for orchestration
- gpt-oss as the LLM

Let's begin!

For context...

voyage-context-3 is a contextualized chunk embedding model that produces chunk embeddings with full document context.

This is unlike common chunk embedding models that embed chunks independently.

(We'll discuss the results later in the thread)

Check this👇

Read 14 tweets

Akshay 🚀

@akshay_pachaar

Aug 29

I have been training neural networks for 10 years now.

Here are 16 ways I actively use to optimize model training:

Before we dive in, the following visual covers what we are discussing today.

Let's understand them in detail below!

These are some basic techniques:

1) Use efficient optimizers—AdamW, Adam, etc.

2) Utilize hardware accelerators (GPUs/TPUs).

3) Max out the batch size.

4) Use multi-GPU training through Model/Data/Pipeline/Tensor parallelism. Check the visual👇

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Akshay 🚀

Try unrolling a thread yourself!

More from @akshay_pachaar

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Akshay 🚀

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!