Dr Alex Young ⚡️ Profile picture
Aug 9 20 tweets 5 min read Read on X
I just read "Foundations of LLMs 2025" cover to cover.

It explained large language models so clearly that I can finally say: I get it.

Here’s the plain-English breakdown I wish I had years ago: Image
To understand LLMs, start with pre-training.

We don’t teach them specific tasks.

We flood them with raw text and let them discover patterns on their own.

This technique is called self-supervised learning and it’s the foundation of everything.
There are 3 ways to pre-train:

→ Unsupervised: No labels at all
→ Supervised: Classic labeled data
→ Self-supervised: Model creates its own labels (e.g., “guess the missing word”)

LLMs use #3 it scales like crazy and teaches them language from scratch.
Example of self-supervised learning:

“The early bird catches the worm.”

Mask some words:

→ “The [MASK] bird catches the [MASK]”

The model’s job? Fill in the blanks.

No human labels. The text is the supervision. Image
This leads to 3 main model types:

→ Encoder-only (BERT): Understands text
→ Decoder-only (GPT): Generates next word
→ Encoder-decoder (T5): Translates input to output

Each has strengths. Think of them as different tools for different jobs.
Let’s break it down further.

Decoder-only (GPT-style):

Trained to guess the next word:

“The cat sat on the ___” → “mat”

This is called causal language modeling.
Loss is measured by how wrong the guesses are (cross-entropy). Image
Encoder-only (BERT-style):

Takes the whole sentence.
Randomly hides some words and tries to reconstruct them.

This is masked language modeling uses left and right context.

Great for understanding, not generation.
Example:

Original:
→ “The early bird catches the worm”

Masked:
→ “The [MASK] bird catches the [MASK]”

The model predicts “early” and “worm” by understanding the whole sentence.

It’s learning language by solving puzzles.
Encoder-decoder (T5, BART):

Treats everything as a text-to-text task.

Examples:

“Translate English to German: hello” → “hallo”
“Sentiment: I hate this” → “negative”

This setup lets one model do it all: QA, summarization, translation, etc. Image
Once pre-trained, we have two options:

→ Fine-tune it on a labeled dataset
→ Prompt it cleverly to do new tasks

Fine-tuning adjusts weights.
Prompting? Just tweaks the input text.

Let’s dive into the magic of prompts.
Prompting = carefully phrasing input so the model does what you want.

Example:

“I love this movie. Sentiment:”

It’ll likely respond: “positive”

Add a few examples before it? That’s in-context learning no fine-tuning needed.
Prompting gets deep.

Advanced strategies:

• Chain of thought → “Let’s think step by step...”
• Decomposition → Break complex tasks into parts
• Self-refinement → Ask the model to critique itself
• RAG → Let it fetch real-time data from external sources Image
This is all possible because of the way these models are trained: predict the next word over and over until they internalize language structure, reasoning patterns, and world knowledge.

It's not magic. it's scale.
But raw intelligence isn’t enough.

We need models to align with human goals.

That’s where alignment comes in.

It happens in two major phases after pretraining 👇
Supervised Fine-Tuning (SFT)

Feed the model good human responses. Let it learn how we want it to reply.

RLHF (Reinforcement Learning w/ Human Feedback)

Train a reward model to prefer helpful answers. Use it to steer the LLM.

This is how ChatGPT was aligned.

RLHF is powerful but tricky.

Newer methods like Direct Preference Optimization (DPO) are rising fast.

Why?
They skip the unstable reward modeling of RL and go straight to optimizing for preferences.

More stable. More scalable.
Inference how the model runs is just as important as training.

To serve real-time outputs, we use tricks like:

→ Top-k / nucleus sampling
→ Caching past tokens
→ Batching requests
→ Memory-efficient attention

These make LLMs usable at scale.
So how do LLMs really work?

Trained on massive text

Predict the next word millions of times

Use Transformers to encode dependencies

Adapt via prompting or fine-tuning

Aligned to human preferences

Served with smart inference
this was based on the brilliant textbook:

"Foundations of Large Language Models" by Tong Xiao and Jingbo Zhu (NiuTrans Research Lab)

arxiv:

highly recommend it if you're serious about understanding LLMs deeply.arxiv.org/abs/2501.09223…
P.S.

We built ClipYard for ruthless performance marketers.

→ Better ROAS
→ 10x faster content ops
→ No human error
→ Full creative control

You’ve never seen AI avatars like this before → clipyard.ai
I hope you've found this thread helpful.

Follow me @AlexanderFYoung for more.

Like/Repost the quote below if you can:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Dr Alex Young ⚡️

Dr Alex Young ⚡️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AlexanderFYoung

Aug 8
Grok might be the smartest stock trader on the planet.

But only if you know the right prompts.

Here are 10 to put your trades on autopilot 👇
1/ Market Analysis:

"Analyze the current trends in the stock market, focusing on [input sector or stock]. Identify any emerging patterns and suggest potential investment opportunities. Consider recent earnings reports and industry news in your analysis."
2/ Portfolio Diversification:

"Given a portfolio with a mix of [input current sectors or stocks], suggest strategies to diversify further while minimizing risk. Include potential sectors to explore and specific stocks to consider."
Read 13 tweets
Aug 1
You don’t need a PhD to master AI.

You need 30 days and a smart plan.

Here’s the exact roadmap (even if you’re starting from zero):
Most beginners start by:

- Building chatbots
- Downloading agent frameworks
- Playing with APIs

But they don’t understand tokens, prompts, or context windows.

That’s like trying to write a novel before learning the alphabet.
The AI hierarchy

Start here:

AI → ML → DL → LLMs

AI = mimics intelligence (rules or learning)
ML = learns from data
DL = uses neural networks
LLMs = predicts the next word at scale

If this is fuzzy, everything else will confuse you later. Image
Read 16 tweets
Jul 30
This is AMAZING.

You can ask ChatGPT-4o to explain Warren Buffett’s portfolio, analyze market trends, and even spot risky stocks.

Here are 10 essential prompts for every trader:
1/ Market Analysis:

"Analyze the current trends in the stock market, focusing on [input sector or stock]. Identify any emerging patterns and suggest potential investment opportunities. Consider recent earnings reports and industry news in your analysis."
2/ Portfolio Diversification:

"Given a portfolio with a mix of [input current sectors or stocks], suggest strategies to diversify further while minimizing risk. Include potential sectors to explore and specific stocks to consider."
Read 12 tweets
Jul 28
ChatGPT is insanely powerful

but 99% of people are using it like a smarter Google

here are 8 prompts to actually automate your work and save hours: Image
1. Market Research

"Conduct market research on {industry/product}. Identify trends, competitors, consumer behavior, and growth opportunities. Provide insights backed by data, key statistics, and strategic recommendations to leverage market gaps effectively."

Use Case: Launching a new product or validating an idea.

Transforms scattered data into actionable strategy using trends, stats, and competitive intelligence.
2. Content Creation

"Create engaging, non-generic content on {topic}. Avoid robotic or formulaic responses; use a conversational, human-like tone. Incorporate storytelling, examples, and unique insights. Make it feel fresh, original, and compelling for the target audience of {industry}."

Use Case: Blog posts, social media content, or branded storytelling.

Blends personality with precision, which builds trust and drives engagement.
Read 11 tweets
Jul 24
Grok 4 is terrifyingly powerful.

But 99% of users are wasting it on surface-level stuff.

No strategy. No depth. No results.

These 8 prompts change everything:
1. Market Research

"Conduct market research on {industry/product}. Identify trends, competitors, consumer behavior, and growth opportunities. Provide insights backed by data, key statistics, and strategic recommendations to leverage market gaps effectively."

Use Case: Launching a new product or validating an idea.

Transforms scattered data into actionable strategy using trends, stats, and competitive intelligence.
2. Content Creation

"Create engaging, non-generic content on {topic}. Avoid robotic or formulaic responses; use a conversational, human-like tone. Incorporate storytelling, examples, and unique insights. Make it feel fresh, original, and compelling for the target audience of {industry}."

Use Case: Blog posts, social media content, or branded storytelling.

Blends personality with precision, which builds trust and drives engagement.
Read 10 tweets
Jul 22
Nobody talks about this…

But if you prompt ChatGPT right, it becomes the best teacher you’ve ever had.

Here are 8 prompts that helped me learn anything 10x faster:
1/ Deep Dive into a Topic:

Prompt:

"Act as an expert on [subject], explain the most important concepts, and provide real-world examples to illustrate each. Then, give me a step-by-step guide to master this topic in the next 30 days."
2/ Personalized Learning Plan:

Prompt:

"Help me design a personalized learning plan for mastering [subject]. Break it down into daily learning tasks, recommended resources, and practical exercises I can do to build my skills."
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(