Alex Hughes Profile picture
Aug 12 20 tweets 5 min read Read on X
The best explanation of LLMs I've ever seen is in this new book.

"Foundations of LLMs 2025."

I've summarized the core concepts into a thread you can read in 3 minutes.

Finally, it all makes sense. Image
To understand LLMs, start with pre-training.

We don’t teach them specific tasks.

We flood them with raw text and let them discover patterns on their own.

This technique is called self-supervised learning and it’s the foundation of everything.
There are 3 ways to pre-train:

→ Unsupervised: No labels at all
→ Supervised: Classic labeled data
→ Self-supervised: Model creates its own labels (e.g., “guess the missing word”)

LLMs use #3 it scales like crazy and teaches them language from scratch.
Example of self-supervised learning:

“The early bird catches the worm.”

Mask some words:

→ “The [MASK] bird catches the [MASK]”

The model’s job? Fill in the blanks.

No human labels. The text is the supervision. Image
This leads to 3 main model types:

→ Encoder-only (BERT): Understands text
→ Decoder-only (GPT): Generates next word
→ Encoder-decoder (T5): Translates input to output

Each has strengths. Think of them as different tools for different jobs.
Let’s break it down further.

Decoder-only (GPT-style):

Trained to guess the next word:

“The cat sat on the ___” → “mat”

This is called causal language modeling.
Loss is measured by how wrong the guesses are (cross-entropy). Image
Encoder-only (BERT-style):

Takes the whole sentence.
Randomly hides some words and tries to reconstruct them.

This is masked language modeling uses left and right context.

Great for understanding, not generation.
Example:

Original:
→ “The early bird catches the worm”

Masked:
→ “The [MASK] bird catches the [MASK]”

The model predicts “early” and “worm” by understanding the whole sentence.

It’s learning language by solving puzzles.
Encoder-decoder (T5, BART):

Treats everything as a text-to-text task.

Examples:

“Translate English to German: hello” → “hallo”
“Sentiment: I hate this” → “negative”

This setup lets one model do it all: QA, summarization, translation, etc. Image
Once pre-trained, we have two options:

→ Fine-tune it on a labeled dataset
→ Prompt it cleverly to do new tasks

Fine-tuning adjusts weights.
Prompting? Just tweaks the input text.

Let’s dive into the magic of prompts.
Prompting = carefully phrasing input so the model does what you want.

Example:

“I love this movie. Sentiment:”

It’ll likely respond: “positive”

Add a few examples before it? That’s in-context learning no fine-tuning needed.
Prompting gets deep.

Advanced strategies:

• Chain of thought → “Let’s think step by step...”
• Decomposition → Break complex tasks into parts
• Self-refinement → Ask the model to critique itself
• RAG → Let it fetch real-time data from external sources Image
This is all possible because of the way these models are trained: predict the next word over and over until they internalize language structure, reasoning patterns, and world knowledge.

It's not magic. it's scale.
But raw intelligence isn’t enough.

We need models to align with human goals.

That’s where alignment comes in.

It happens in two major phases after pretraining 👇
Supervised Fine-Tuning (SFT)

Feed the model good human responses. Let it learn how we want it to reply.

RLHF (Reinforcement Learning w/ Human Feedback)

Train a reward model to prefer helpful answers. Use it to steer the LLM.

This is how ChatGPT was aligned.

RLHF is powerful but tricky.

Newer methods like Direct Preference Optimization (DPO) are rising fast.

Why?
They skip the unstable reward modeling of RL and go straight to optimizing for preferences.

More stable. More scalable.
Inference how the model runs is just as important as training.

To serve real-time outputs, we use tricks like:

→ Top-k / nucleus sampling
→ Caching past tokens
→ Batching requests
→ Memory-efficient attention

These make LLMs usable at scale.
So how do LLMs really work?

Trained on massive text

Predict the next word millions of times

Use Transformers to encode dependencies

Adapt via prompting or fine-tuning

Aligned to human preferences

Served with smart inference
this was based on the brilliant textbook:

"Foundations of Large Language Models" by Tong Xiao and Jingbo Zhu (NiuTrans Research Lab)

arxiv:

highly recommend it if you're serious about understanding LLMs deeply.arxiv.org/abs/2501.09223…
90% of customers expect instant replies.

Most businesses? Still responding days later.

Meet Droxy AI: Your 24/7 AI employee:

• Handles calls, chats, comments
• Speaks 95+ languages
• Feels human
• Costs $20/mo

Start automating:

try.droxy.ai/now
I hope you've found this thread helpful.

Follow me @alxnderhughes for more.

Like/Repost the quote below if you can:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Hughes

Alex Hughes Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alxnderhughes

Jul 29
holy sh*t… Claude 4 Sonnet just went full investment banker mode

3 mega prompts =

• earnings call analysis
• valuation models
• thesis writing
• risk analysis

this is elite tier

here’s the exact mega-prompt I used 👇
What do Goldman analysts actually do?

- Company valuations
- Market comps
- Investment memos
- Risk modeling
- Decks for MDs to take credit for

Now?

AI can automate 90% of it.

You just need the right prompts.
1/ The Investment Banking Analyst

Prompt:

"You are a Goldman Sachs VP-level analyst with 10+ years of experience in investment banking. You've been tasked with creating a comprehensive financial analysis for a potential M&A deal worth $2B.

Your mission:
1. Build a detailed DCF valuation model with multiple scenarios
2. Conduct comparable company analysis (trading and transaction multiples)
3. Assess synergy opportunities and integration risks
4. Create sensitivity analysis and scenario planning
5. Prepare executive summary for client presentation

Use frameworks like:
- DCF modeling with terminal value calculations
- Trading comps and transaction comps analysis
- Precedent transaction analysis
- Accretion/dilution analysis
- Risk assessment and mitigation strategies

Output everything in investment banking format: Executive Summary, Valuation Summary, Detailed Analysis, Appendix with assumptions.

Target company: [COMPANY NAME]
Industry: [INSERT INDUSTRY]
Deal type: [M&A/IPO/FINANCING]"
Read 8 tweets
Jul 27
Grok 4 is actually cracked

Most people don't know how to use it.

But it can fully automate workflows if you know what to ask.

Here are 8 wild prompts to steal:
1. Market Research

"Conduct market research on {industry/product}. Identify trends, competitors, consumer behavior, and growth opportunities. Provide insights backed by data, key statistics, and strategic recommendations to leverage market gaps effectively."

Use Case: Launching a new product or validating an idea.

Transforms scattered data into actionable strategy using trends, stats, and competitive intelligence.
2. Content Creation

"Create engaging, non-generic content on {topic}. Avoid robotic or formulaic responses; use a conversational, human-like tone. Incorporate storytelling, examples, and unique insights. Make it feel fresh, original, and compelling for the target audience of {industry}."

Use Case: Blog posts, social media content, or branded storytelling.

Blends personality with precision, which builds trust and drives engagement.
Read 11 tweets
Jul 26
BREAKING: AI just replaced a $300K trading team.

It:

• scans markets 24/7
• runs strategies in real-time
• executes faster than any human

Here’s how it works (and how you can use it):
1. Introducing Intellectia, the top AI investment platform where AI meets your goals.

It helps you reach your goals with its AI Stock Picker, giving you daily tips on the best stocks to trade.

Try it free at Try.intellectia.ai
2. DayTrading Center

DayTrading Center helps you make smarter trades with real-time insights.

Discover daily trading chances with easy-to-use tools for finding the best times to buy and sell, so you can trade like an expert.
Read 12 tweets
Jul 25
Want better output from ChatGPT, Claude, Gemini?

Use better prompts.

Here’s a 3-step system that turns AI into a useful collaborator: Image
1. Start with the outcome

Don’t ask what the AI can do.

Tell it what you want done.

Not “write a plan” → Instead: “Help me outline a 5-day launch strategy for a fitness app.”

AI works better when you show it the finish line.
2. Add structure

AI loves constraints.

If you want a framework, give it a format:

“Write in this format: Hook, 3 bullets, Summary.”

If you want code, give it the output:

“Create a landing page with headline, email form, and mobile responsiveness.”

Clarity creates quality.
Read 5 tweets
Jul 24
Stop Paying for AI Courses.

The Best Ones Are 100% Free!

Here are 7 Free AI courses that are better than paid ones:
1. Introduction to Generative AI Learning Path:

You'll learn:

• AI fundamentals
• Generative AI
• Learn LLMs and how they work
• Responsible AI
• Google Cloud AI

🔗 cloudskillsboost.google/journeys/118
2. AI For Everyone

You'll learn:

• Basics of AI
• Project creation
• Build an AI company
• AI and its impact on society

🔗 coursera.org/learn/ai-for-e…
Read 10 tweets
Jul 23
Claude’s XML prompting system is one of the most underrated cheat codes in AI.

• tighter control
• clearer formatting
• zero hallucination

Here’s how to use it ↓
Why XML?

Claude was trained on structured, XML-heavy data like documentation, code, and datasets.

So when you use XML tags in your prompts, you’re literally speaking its native language.

The result? Sharper, cleaner, and more controllable outputs.

(Anthropic says that XML tag prompts gets best results)Image
Why it works:

✅ Clarifies intent
✅ Mimics Claude's training structure
✅ Boosts reasoning + structure
✅ Reduces hallucination

Now let’s dive into 5 🔥 real-world use cases:
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(