Avi Chawla Profile picture
Jul 8 13 tweets 4 min read Read on X
How LLMs work, clearly explained (with visuals):
Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇 Image
So what is Conditional probability?

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration👇 Image
For instance, if we're predicting whether it will rain today (event A), knowing that it's cloudy (event B) might impact our prediction.

As it's more likely to rain when it's cloudy, we'd say the conditional probability P(A|B) is high.

That's conditional probability!
Now, how does this apply to LLMs like GPT-4?

These models are tasked with predicting/guessing the next word in a sequence.

This is a question of conditional probability: given the words that have come before, what is the most likely next word? Image
To predict the next word, the model calculates the conditional probability for each possible next word, given the previous words (context).

The word with the highest conditional probability is chosen as the prediction. Image
The LLM learns a high-dimensional probability distribution over sequences of words.

And the parameters of this distribution are the trained weights!

The training (or rather pre-training) is supervised.

I'll talk about the different training steps next time!

Check this 👇 Image
But there is a problem!

If we always pick the word with the highest probability, we end up with repetitive outputs, making LLMs almost useless and stifling their creativity.

This is where temperature comes into the picture.

Check this before we understand more about it...👇 Image
However, a high temperature value produces a gibberish output.

Let's understand what's going on...👇 Image
So, the LLMs instead of selecting the best token (for simplicity let's think of tokens as words), they "sample" the prediction.

So even if “Token 1” has the highest score, it may not be chosen since we are sampling. Image
Now, temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process: Image
Let's take a code example!

At low temperature, probabilities concentrate around the most likely token, resulting in nearly greedy generation.

At high temperature, probabilities become more uniform, producing highly random and stochastic outputs.

Check this out👇 Image
That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Avi Chawla

Avi Chawla Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_avichawla

Jul 3
uv in Python, clearly explained (with code):
uv is incredibly fast.

- Creating virtual envs. using uv is ~80x faster than python -m venv.
- Package installation is 4–12x faster without caching, and ~100x with caching

Today, let's understand how to use uv for Python package management.

Let's dive in! Image
uv is a Rust-based Python package manager built to be fast and reliable.

It replaces not just pip but also pip-tools, virtualenv, pipx, poetry, and pyenv, all with a single standalone binary.

Here's a uv cheatsheet for Python devs👇

Let's look at the code next!
Read 10 tweets
Jun 29
MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):
Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.
What is A2A?

A2A (Agent2Agent) enables multiple AI agents to work together on tasks without directly sharing their internal memory, thoughts, or tools.

Instead, they communicate by exchanging context, task updates, instructions, and data.
Read 8 tweets
Jun 26
10 GitHub repos that will set you up for a career in AI engineering (100% free):
1️⃣ ML for Beginners by Microsoft

A 12-week project-based curriculum that teaches classical ML using real-world datasets using Scikit-learn.

Includes quizzes, R/Python lessons, and hands-on projects. Some of the lessons are available as short-form videos.

Check this👇 Image
2️⃣ AI for Beginners by Microsoft

This repo covers neural networks, NLP, CV, transformers, ethics & more. There are hands-on labs in PyTorch & TensorFlow using jupyter notebooks.

Beginner-friendly, project-based, and full of real-world applications.

Check this 👇 Image
Read 13 tweets
Jun 25
How Agents test Agents, clearly explained (with code):
Today, we'll learn Agent Testing by building a pipeline to test Agents with other Agents using Scenario.

Our open-source tech stack:
- @crewAIInc for Agent orchestration.
- @LangWatchAI Scenario to build the eval pipeline.
- @pytestdotorg as the runner.

Let's begin!
Here's what the process looks like:

1) Define three Agents:
- The Agent you want to test.
- A User Simulator Agent that acts like a real user.
- A Judge Agent for evaluation.

2) Let your Agent and User Agent interact.

3) Evaluate the exchange using Judge Agent.
Read 9 tweets
Jun 24
Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:
Before we begin, here's what we'll be doing.

We'll fine-tune our locally running DeepSeek-R1 (distilled Llama variant).

To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.

Let's begin!
1) Load the model

We start by loading the Distilled Llama-8B model and the tokenizer of DeepSeek-R1 using Unsloth: Image
Read 10 tweets
Jun 22
Let's build an MCP server (100% locally):
Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a local MCP server and interact with it via @cursor_ai.
- Integrate @Stagehanddev MCP and interact with it via Claude Desktop (shown in the video).

Let's dive in!
First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(