Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Avi Chawla

@_avichawla

Jul 8 • 13 tweets • 4 min read • Read on X

Scrolly

How LLMs work, clearly explained (with visuals):

Before diving into LLMs, we must understand conditional probability.

Let's consider a population of 14 individuals:

- Some of them like Tennis 🎾
- Some like Football ⚽️
- A few like both 🎾 ⚽️
- And few like none

Here's how it looks 👇

So what is Conditional probability?

It's a measure of the probability of an event given that another event has occurred.

If the events are A and B, we denote this as P(A|B).

This reads as "probability of A given B"

Check this illustration👇

For instance, if we're predicting whether it will rain today (event A), knowing that it's cloudy (event B) might impact our prediction.

As it's more likely to rain when it's cloudy, we'd say the conditional probability P(A|B) is high.

That's conditional probability!

Now, how does this apply to LLMs like GPT-4?

These models are tasked with predicting/guessing the next word in a sequence.

This is a question of conditional probability: given the words that have come before, what is the most likely next word?

To predict the next word, the model calculates the conditional probability for each possible next word, given the previous words (context).

The word with the highest conditional probability is chosen as the prediction.

The LLM learns a high-dimensional probability distribution over sequences of words.

And the parameters of this distribution are the trained weights!

The training (or rather pre-training) is supervised.

I'll talk about the different training steps next time!

Check this 👇

But there is a problem!

If we always pick the word with the highest probability, we end up with repetitive outputs, making LLMs almost useless and stifling their creativity.

This is where temperature comes into the picture.

Check this before we understand more about it...👇

However, a high temperature value produces a gibberish output.

Let's understand what's going on...👇

So, the LLMs instead of selecting the best token (for simplicity let's think of tokens as words), they "sample" the prediction.

So even if “Token 1” has the highest score, it may not be chosen since we are sampling.

Now, temperature introduces the following tweak in the softmax function, which, in turn, influences the sampling process:

Let's take a code example!

At low temperature, probabilities concentrate around the most likely token, resulting in nearly greedy generation.

At high temperature, probabilities become more uniform, producing highly random and stochastic outputs.

Check this out👇

https://twitter.com/1175166450832687104/status/1942472125484523605

That's a wrap!

If you found it insightful, reshare it with your network.

Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

https://twitter.com/1175166450832687104/status/1942472125484523605

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_avichawla

Avi Chawla

@_avichawla

Jul 3

uv in Python, clearly explained (with code):

uv is incredibly fast.

- Creating virtual envs. using uv is ~80x faster than python -m venv.
- Package installation is 4–12x faster without caching, and ~100x with caching

Today, let's understand how to use uv for Python package management.

Let's dive in!

uv is a Rust-based Python package manager built to be fast and reliable.

It replaces not just pip but also pip-tools, virtualenv, pipx, poetry, and pyenv, all with a single standalone binary.

Here's a uv cheatsheet for Python devs👇

Let's look at the code next!

Read 10 tweets

Avi Chawla

@_avichawla

Jun 29

MCP & A2A (Agent2Agent) protocol, clearly explained (with visuals):

Agentic applications require both A2A and MCP.

- MCP provides agents with access to tools.
- A2A allows agents to connect with other agents and collaborate in teams.

Today, let's clearly understand what A2A is and how it can work with MCP.

What is A2A?

A2A (Agent2Agent) enables multiple AI agents to work together on tasks without directly sharing their internal memory, thoughts, or tools.

Instead, they communicate by exchanging context, task updates, instructions, and data.

Read 8 tweets

Avi Chawla

@_avichawla

Jun 26

10 GitHub repos that will set you up for a career in AI engineering (100% free):

1️⃣ ML for Beginners by Microsoft

A 12-week project-based curriculum that teaches classical ML using real-world datasets using Scikit-learn.

Includes quizzes, R/Python lessons, and hands-on projects. Some of the lessons are available as short-form videos.

Check this👇

2️⃣ AI for Beginners by Microsoft

This repo covers neural networks, NLP, CV, transformers, ethics & more. There are hands-on labs in PyTorch & TensorFlow using jupyter notebooks.

Beginner-friendly, project-based, and full of real-world applications.

Check this 👇

Read 13 tweets

Avi Chawla

@_avichawla

Jun 25

How Agents test Agents, clearly explained (with code):

Today, we'll learn Agent Testing by building a pipeline to test Agents with other Agents using Scenario.

Our open-source tech stack:
- @crewAIInc for Agent orchestration.
- @LangWatchAI Scenario to build the eval pipeline.
- @pytestdotorg as the runner.

Let's begin!

Here's what the process looks like:

1) Define three Agents:
- The Agent you want to test.
- A User Simulator Agent that acts like a real user.
- A Judge Agent for evaluation.

2) Let your Agent and User Agent interact.

3) Evaluate the exchange using Judge Agent.

Read 9 tweets

Avi Chawla

@_avichawla

Jun 24

Let's fine-tune DeepSeek-R1 (distilled Llama) 100% locally:

Before we begin, here's what we'll be doing.

We'll fine-tune our locally running DeepSeek-R1 (distilled Llama variant).

To do this, we'll use:
- @UnslothAI for efficient fine-tuning.
- @ollama to run it locally.

Let's begin!

1) Load the model

We start by loading the Distilled Llama-8B model and the tokenizer of DeepSeek-R1 using Unsloth:

Read 10 tweets

Avi Chawla

@_avichawla

Jun 22

Let's build an MCP server (100% locally):

Before diving in, here's what we'll be doing today:

- Understand MCP with a simple analogy.
- Build a local MCP server and interact with it via @cursor_ai.
- Integrate @Stagehanddev MCP and interact with it via Claude Desktop (shown in the video).

Let's dive in!

First, let's understand MCP using a translation analogy.

Imagine you only know English. To get info from a person who only knows:

- French, you must learn French.
- German, you must learn German.
- and so on.

Learning even 5 languages will be a nightmare for you!

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Avi Chawla

Try unrolling a thread yourself!

More from @_avichawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Avi Chawla

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!