Josep Ferrer Profile picture
Jul 28, 2024 โ€ข 10 tweets โ€ข 3 min read โ€ข Read on X
The Transformers architecture clearly explained ๐Ÿ‘‡๐Ÿป Image
Today I'm starting a new series of threads to simplify the concept of Transformers and what's behind the Natural Language abilities of LLMs.

Let's start with the basics of the Transformer architecture:

The encoder/decoder concept. ๐Ÿง โœจ
1๏ธโƒฃ ๐—ช๐—›๐—”๐—ง ๐—œ๐—ฆ ๐—” ๐—ง๐—ฅ๐—”๐—ก๐—ฆ๐—™๐—ข๐—ฅ๐— ๐—˜๐—ฅ?
A Transformer is a neural network that excels at understanding the context of sequential data and generating new data from it.

They are the first to rely solely on self-attention, without using RNNs or convolution. Image
2๏ธโƒฃ ๐—ง๐—ฅ๐—”๐—ก๐—ฆ๐—™๐—ข๐—ฅ๐— ๐—˜๐—ฅ ๐—”๐—ฆ ๐—” ๐—•๐—Ÿ๐—”๐—–๐—ž ๐—•๐—ข๐—ซ
Imagine a Transformer for language translation as a BLACK BOX. ๐ŸŽฉ
โ€ข Input: A sentence in one language.
โ€ข Output: Its translation.

But what happens inside this black box? Let's find out! ๐Ÿ” Image
3๏ธโƒฃ ๐—˜๐—ก๐—–๐—ข๐——๐—˜๐—ฅ/๐——๐—˜๐—–๐—ข๐——๐—˜๐—ฅ architecture
โ€ข Input: Spanish sentence ยฟDe quiรฉn es?
โ€ข Encoder: Transforms it into a structured format capturing its essence.
โ€ข Decoder: Receives this encoded data and generates the translation.
โ€ข Output: The translated sentence: Whose is it? Image
4๏ธโƒฃ ๐—ง๐—›๐—˜ ๐—”๐—ฅ๐—–๐—›๐—œ๐—ง๐—˜๐—–๐—ง๐—จ๐—ฅ๐—˜ BEHIND THE TRANSFORMERS
Each encoder and decoder is made up of layers. Here's how they work:
โ€ข Encoders: Process the input sequentially, layer by layer.
โ€ข Decoders: Take the encoded data and generate the output step by step. Image
Both use self-attention and feed-forward neural networks, enabling the generation of natural language.

Tomorrow we will break down the architecture of both core elements of the Transformers architecture. Image
Do you want to understand the Transformers architecture?
Then go check my last article about Transformers๐Ÿ‘‡๐Ÿป

aigents.co/data-science-bโ€ฆ
If you are interested in...
โ€ข Python ๐Ÿ
โ€ข SQL ๐Ÿ’พ
โ€ข ML/MLOps ๐Ÿ› 
โ€ข LLMs & NLP ๐Ÿ—ฃ
โ€ข DataViz ๐Ÿ—ฃ
โ€ข AI Engineering โš™๏ธ

Then follow me โ†’ @rfeers
Did you like this post?

Then join my freshly started DataBites newsletter to get all my content right to your mail every week! ๐Ÿงฉ

๐Ÿ‘‰๐Ÿป databites.tech
Image

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Josep Ferrer

Josep Ferrer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rfeers

Sep 15, 2025
How to make your LLMs smarter and more efficient explained!๐Ÿ‘‡๐Ÿป

(Don't forget to bookmark for later ๐Ÿ˜‰) Image
Creating an LLM demo is a breeze.
But... refining it for production? That's where the real challenge begins! ๐Ÿ› ๏ธ

Teams often grapple with LLMs lacking deep knowledge or delivering inaccurate outputs.

How do we fix this?
Optimization isn't a one-size-fits-all. Approach it along two axes:

๐Ÿง  ๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Is the model missing the right info?
โš™๏ธ ๐—Ÿ๐—Ÿ๐—  ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Is the model's output off-target? ๐ŸŽฏ

Let's break down the three primary tools ๐Ÿ’ฅ
Read 10 tweets
Apr 19, 2025
Multiple-class Logistic Regression clearly explained ๐Ÿ‘‡๐Ÿป

(Don't forget to bookmark for later! ๐Ÿ˜‰) Image
By default, Logistic Regression is like a coin toss - heads or tails, A or B.

But what if you have multiple classes?

That's where we adapt our model for MULTIPLE CHOICES!
There are two main ways:
1๏ธโƒฃ ๐—ข๐—ก๐—˜-๐—ฉ๐—ฆ-๐—ฅ๐—˜๐—ฆ๐—ง (๐—ข๐˜ƒ๐—ฅ):
The Logistic Regression model excels in classifying binary choices.

So... what if we train multiple Logistic Regression classifiers for every class?

๐Ÿ’ก The idea would be to focus on classifying a single class vs the rest. Image
Read 13 tweets
Apr 15, 2025
Simple Linear Regression exemplified for dummies๐Ÿ‘‡๐Ÿป

(Don't forget to bookmark for later! ๐Ÿ˜‰) Image
1๏ธโƒฃ ๐——๐—”๐—ง๐—” ๐—š๐—”๐—ง๐—›๐—˜๐—ฅ๐—œ๐—ก๐—š ๐—ฃ๐—›๐—”๐—ฆ๐—˜
We're using height and weight - a classic duo often assumed to have a linear relationship.

But assumptions in data science? No way! ๐Ÿง

Let's find out:
- Do height and weight really share a linear bond? Image
Do you like this post?

Then join my DataBites newsletter to get all my content right to your mail every Sunday! ๐Ÿงฉ

๐Ÿ‘‰๐Ÿป ๐Ÿค“databites.tech
Read 18 tweets
Apr 14, 2025
Linear Regression clearly explained ๐Ÿ‘‡๐Ÿป Image
Linear regression is the simplest statistical regression method used for predictive analysis.

It can be performed with multiple variables.... but today we'll focus on a single one.

Also known as Simple Linear Regression.
1๏ธโƒฃ ๐—ฆ๐—œ๐— ๐—ฃ๐—Ÿ๐—˜ ๐—Ÿ๐—œ๐—ก๐—˜๐—”๐—ฅ ๐—ฅ๐—˜๐—š๐—ฅ๐—˜๐—ฆ๐—ฆ๐—œ๐—ข๐—ก
In Simple Linear Regression, we use one independent variable to predict a dependent one.

The main goal? ๐ŸŽฏ
Finding a line of best fit.

It's simple yet powerful, revealing hidden trends in data. Image
Read 13 tweets
Mar 15, 2025
Linear Regression clearly explained ๐Ÿ‘‡๐Ÿป Image
Linear regression is the simplest statistical regression method used for predictive analysis.

It can be performed with multiple variables.... but today we'll focus on a single one.

Also known as Simple Linear Regression.
1๏ธโƒฃ ๐—ฆ๐—œ๐— ๐—ฃ๐—Ÿ๐—˜ ๐—Ÿ๐—œ๐—ก๐—˜๐—”๐—ฅ ๐—ฅ๐—˜๐—š๐—ฅ๐—˜๐—ฆ๐—ฆ๐—œ๐—ข๐—ก
In Simple Linear Regression, we use one independent variable to predict a dependent one.

The main goal? ๐ŸŽฏ
Finding a line of best fit.

It's simple yet powerful, revealing hidden trends in data. Image
Read 13 tweets
Mar 13, 2025
The Transformer's encoder clearly explained ๐Ÿ‘‡๐Ÿป Image
1๏ธโƒฃ ๐—ช๐—›๐—”๐—ง'๐—ฆ ๐—ง๐—›๐—˜ ๐—˜๐—ก๐—–๐—ข๐——๐—˜๐—ฅ? ๐Ÿง 

The Encoder is the part responsible for processing input tokens through self-attention and feed-forward layers to generate context-aware representations.

๐Ÿ‘‰ Itโ€™s the powerhouse behind understanding sequences in NLP models. Image
Are you enjoying this post?

Then join my newsletter DataBites to get all my content right to your mail every week! ๐Ÿงฉ

๐Ÿ‘‰๐Ÿป databites.tech
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(