Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Dr Alex Young ⚡️

Aug 9 • 20 tweets • 5 min read • Read on X

I just read "Foundations of LLMs 2025" cover to cover.

It explained large language models so clearly that I can finally say: I get it.

Here’s the plain-English breakdown I wish I had years ago:

To understand LLMs, start with pre-training.

We don’t teach them specific tasks.

We flood them with raw text and let them discover patterns on their own.

This technique is called self-supervised learning and it’s the foundation of everything.

There are 3 ways to pre-train:

→ Unsupervised: No labels at all
→ Supervised: Classic labeled data
→ Self-supervised: Model creates its own labels (e.g., “guess the missing word”)

LLMs use #3 it scales like crazy and teaches them language from scratch.

Example of self-supervised learning:

“The early bird catches the worm.”

Mask some words:

→ “The [MASK] bird catches the [MASK]”

The model’s job? Fill in the blanks.

No human labels. The text is the supervision.

This leads to 3 main model types:

→ Encoder-only (BERT): Understands text
→ Decoder-only (GPT): Generates next word
→ Encoder-decoder (T5): Translates input to output

Each has strengths. Think of them as different tools for different jobs.

Let’s break it down further.

Decoder-only (GPT-style):

Trained to guess the next word:

“The cat sat on the ___” → “mat”

This is called causal language modeling.
Loss is measured by how wrong the guesses are (cross-entropy).

Encoder-only (BERT-style):

Takes the whole sentence.
Randomly hides some words and tries to reconstruct them.

This is masked language modeling uses left and right context.

Great for understanding, not generation.

Example:

Original:
→ “The early bird catches the worm”

Masked:
→ “The [MASK] bird catches the [MASK]”

The model predicts “early” and “worm” by understanding the whole sentence.

It’s learning language by solving puzzles.

Encoder-decoder (T5, BART):

Treats everything as a text-to-text task.

Examples:

“Translate English to German: hello” → “hallo”
“Sentiment: I hate this” → “negative”

This setup lets one model do it all: QA, summarization, translation, etc.

Once pre-trained, we have two options:

→ Fine-tune it on a labeled dataset
→ Prompt it cleverly to do new tasks

Fine-tuning adjusts weights.
Prompting? Just tweaks the input text.

Let’s dive into the magic of prompts.

Prompting = carefully phrasing input so the model does what you want.

Example:

“I love this movie. Sentiment:”

It’ll likely respond: “positive”

Add a few examples before it? That’s in-context learning no fine-tuning needed.

Prompting gets deep.

Advanced strategies:

• Chain of thought → “Let’s think step by step...”
• Decomposition → Break complex tasks into parts
• Self-refinement → Ask the model to critique itself
• RAG → Let it fetch real-time data from external sources

This is all possible because of the way these models are trained: predict the next word over and over until they internalize language structure, reasoning patterns, and world knowledge.

It's not magic. it's scale.

But raw intelligence isn’t enough.

We need models to align with human goals.

That’s where alignment comes in.

It happens in two major phases after pretraining 👇

Supervised Fine-Tuning (SFT)

Feed the model good human responses. Let it learn how we want it to reply.

RLHF (Reinforcement Learning w/ Human Feedback)

Train a reward model to prefer helpful answers. Use it to steer the LLM.

This is how ChatGPT was aligned.

RLHF is powerful but tricky.

Newer methods like Direct Preference Optimization (DPO) are rising fast.

Why?
They skip the unstable reward modeling of RL and go straight to optimizing for preferences.

More stable. More scalable.

Inference how the model runs is just as important as training.

To serve real-time outputs, we use tricks like:

→ Top-k / nucleus sampling
→ Caching past tokens
→ Batching requests
→ Memory-efficient attention

These make LLMs usable at scale.

So how do LLMs really work?

Trained on massive text

Predict the next word millions of times

Use Transformers to encode dependencies

Adapt via prompting or fine-tuning

Aligned to human preferences

Served with smart inference

this was based on the brilliant textbook:

"Foundations of Large Language Models" by Tong Xiao and Jingbo Zhu (NiuTrans Research Lab)

arxiv:

highly recommend it if you're serious about understanding LLMs deeply.arxiv.org/abs/2501.09223…

P.S.

We built ClipYard for ruthless performance marketers.

→ Better ROAS
→ 10x faster content ops
→ No human error
→ Full creative control

You’ve never seen AI avatars like this before → clipyard.ai

https://twitter.com/23948998/status/1954120806449185101

I hope you've found this thread helpful.

Follow me @AlexanderFYoung for more.

Like/Repost the quote below if you can:

https://twitter.com/23948998/status/1954120806449185101

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @AlexanderFYoung

Dr Alex Young ⚡️

@AlexanderFYoung

Aug 8

Grok might be the smartest stock trader on the planet.

But only if you know the right prompts.

Here are 10 to put your trades on autopilot 👇

1/ Market Analysis:

"Analyze the current trends in the stock market, focusing on [input sector or stock]. Identify any emerging patterns and suggest potential investment opportunities. Consider recent earnings reports and industry news in your analysis."

2/ Portfolio Diversification:

"Given a portfolio with a mix of [input current sectors or stocks], suggest strategies to diversify further while minimizing risk. Include potential sectors to explore and specific stocks to consider."

Read 13 tweets

Dr Alex Young ⚡️

@AlexanderFYoung

Aug 1

You don’t need a PhD to master AI.

You need 30 days and a smart plan.

Here’s the exact roadmap (even if you’re starting from zero):

Most beginners start by:

- Building chatbots
- Downloading agent frameworks
- Playing with APIs

But they don’t understand tokens, prompts, or context windows.

That’s like trying to write a novel before learning the alphabet.

The AI hierarchy

Start here:

AI → ML → DL → LLMs

AI = mimics intelligence (rules or learning)
ML = learns from data
DL = uses neural networks
LLMs = predicts the next word at scale

If this is fuzzy, everything else will confuse you later.

Read 16 tweets

Dr Alex Young ⚡️

@AlexanderFYoung

Jul 30

This is AMAZING.

You can ask ChatGPT-4o to explain Warren Buffett’s portfolio, analyze market trends, and even spot risky stocks.

Here are 10 essential prompts for every trader:

Read 12 tweets

Dr Alex Young ⚡️

@AlexanderFYoung

Jul 28

ChatGPT is insanely powerful

but 99% of people are using it like a smarter Google

here are 8 prompts to actually automate your work and save hours:

1. Market Research

"Conduct market research on {industry/product}. Identify trends, competitors, consumer behavior, and growth opportunities. Provide insights backed by data, key statistics, and strategic recommendations to leverage market gaps effectively."

Use Case: Launching a new product or validating an idea.

Transforms scattered data into actionable strategy using trends, stats, and competitive intelligence.

2. Content Creation

"Create engaging, non-generic content on {topic}. Avoid robotic or formulaic responses; use a conversational, human-like tone. Incorporate storytelling, examples, and unique insights. Make it feel fresh, original, and compelling for the target audience of {industry}."

Use Case: Blog posts, social media content, or branded storytelling.

Blends personality with precision, which builds trust and drives engagement.

Read 11 tweets

Dr Alex Young ⚡️

@AlexanderFYoung

Jul 24

Grok 4 is terrifyingly powerful.

But 99% of users are wasting it on surface-level stuff.

No strategy. No depth. No results.

These 8 prompts change everything:

Read 10 tweets

Dr Alex Young ⚡️

@AlexanderFYoung

Jul 22

Nobody talks about this…

But if you prompt ChatGPT right, it becomes the best teacher you’ve ever had.

Here are 8 prompts that helped me learn anything 10x faster:

1/ Deep Dive into a Topic:

Prompt:

"Act as an expert on [subject], explain the most important concepts, and provide real-world examples to illustrate each. Then, give me a step-by-step guide to master this topic in the next 30 days."

2/ Personalized Learning Plan:

Prompt:

"Help me design a personalized learning plan for mastering [subject]. Break it down into daily learning tasks, recommended resources, and practical exercises I can do to build my skills."