Morgan Brown Profile picture
Jan 27 19 tweets 4 min read Read on X
🧵 Finally had a chance to dig into DeepSeek’s r1…

Let me break down why DeepSeek's AI innovations are blowing people's minds (and possibly threatening Nvidia's $2T market cap) in simple terms...
0/ first off, shout out to @doodlestein who wrote the must-read on this here: youtubetranscriptoptimizer.com/blog/05_the_sh…
1/ First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory.
2/ DeepSeek just showed up and said "LOL what if we did this for $5M instead?" And they didn't just talk - they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.
3/ How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like "what if we just used 8? It's still accurate enough!" Boom - 75% less memory needed.
4/ Then there's their "multi-token" system. Normal AI reads like a first-grader: "The... cat... sat..." DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you're processing billions of words, this MATTERS.
5/ But here's the really clever bit: They built an "expert system." Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.
6/ Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task.
7/ The results are mind-blowing:
- Training cost: $100M → $5M
- GPUs needed: 100,000 → 2,000
- API costs: 95% cheaper
- Can run on gaming GPUs instead of data center hardware
8/ "But wait," you might say, "there must be a catch!" That's the wild part - it's all open source. Anyone can check their work. The code is public. The technical papers explain everything. It's not magic, just incredibly clever engineering.
9/ Why does this matter? Because it breaks the model of "only huge tech companies can play in AI." You don't need a billion-dollar data center anymore. A few good GPUs might do it.
10/ For Nvidia, this is scary. Their entire business model is built on selling super expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs... well, you see the problem.
11/ And here's the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek's entire training budget... and their models aren't as good.
12/ This is a classic disruption story: Incumbents optimize existing processes, while disruptors rethink the fundamental approach. DeepSeek asked "what if we just did this smarter instead of throwing more hardware at it?"
13/ The implications are huge:
- AI development becomes more accessible
- Competition increases dramatically
- The "moats" of big tech companies look more like puddles
- Hardware requirements (and costs) plummet
14/ Of course, giants like OpenAI and Anthropic won't stand still. They're probably already implementing these innovations. But the efficiency genie is out of the bottle - there's no going back to the "just throw more GPUs at it" approach.
15/ Final thought: This feels like one of those moments we'll look back on as an inflection point. Like when PCs made mainframes less relevant, or when cloud computing changed everything.

AI is about to become a lot more accessible, and a lot less expensive. The question isn't if this will disrupt the current players, but how fast.

/end
P.S. And yes, all this is available open source. You can literally try their models right now. We're living in wild times! 🚀
Momma, I'm going viral! No substack or gofundme to share but a few things to add/clarify:

1/ The DeepSeek app is not the same thing as the model. Apps are owned and operated by a Chinese corporation, the model itself is open source.

2/ Jevon's paradox is the counter argument. Thanks papa @satyanadella. Could be a mix shift in chip type, compute type, etc. but we're constrained by power and compute right now, not demand constrained.

3/ The techniques used are not ground breaking. It's the combination of them w/the relative model performance that is so exciting. These are common eng techniques that combined really fly in the face of more compute is the only answer for model performance. Compute is no longer a moat.

4/ Thanks to all for pointing out my NVIDIA market cap numbers miss and other nuances - will do better next time, coach. 🫡

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Morgan Brown

Morgan Brown Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @morganb

Jan 5, 2021
People ask me all the time, what are the most valuable things I learned working at Facebook. I worked there for three years as a PM → Director of Product. (1/x)

Here they are 👇
1/ Just because something is hard to know, doesn’t mean it is unknowable. Similarly, just because something is hard to do, doesn’t mean it can’t or shouldn’t be done.
2/ Rigor and quality of thought is critical to building great products. When you can build anything, structured thinking is essential.
Read 11 tweets
Jun 8, 2019
In “Hacking Growth” @SeanEllis and I talk about prioritizing tests by a score — called the ICE Score. The C in ICE stands for confidence. (Impact and ease are the other two.)

People always ask how to assess confidence or what it consists of. Here’s how to do it:
There are four main ways to build confidence in an idea (and likely more), from least to greatest certainty.
1/ qualitative research/feedback
2/ quantitative research/data
3/ correlational data
4/ causal experimental evidence
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(