Thread by @henrythe9ths on Thread Reader App

DeepSeek just released the first Open Source Reasoning Model that matched o1!

But how did an unknown, 100 person startup with $0 VC funding produce a frontier open source model that rivaled OpenAI and Anthropic at 1/10th of the training cost and is 20-50x cheaper during inference?

After doing extensive research into the company's history, here’s the untold founding story of the rise, fall and rebirth behind DeepSeek and it’s parent company High-Flyer 🧵

1. Humble beginnings

In 2007, three engineers Xu Jin, Zheng Dawei, and Liang Wenfeng (CEO) met at Zhejiang University and bonded over algorithmic trading .

Their idea? Build a quant fund powered by cutting-edge AI. But instead of hiring industry veterans, they prioritized raw talent and curiosity over experience. Liang: “Core technical roles are primarily filled by recent grads or those 1–2 years out.”

2. Quiet Ascent

The team worked quietly on various algorithmic trading ideas for 8 years before founding High-Flyer in 2015.

Their culture of hiring and innovation worked extremely well. By 2021, they were crushing it:
- Invested $140M and built a massive AI trading platform
- Owned 10,000 NVIDIA A100 GPUs
- Became a top 4 quant fund with $15B AUM

Then it all came crashing down...

3. Turning of Tides

2022 was a nightmare. High-Flyer’s success caught up with them.

They grew too big, too fast and started to lose billions.
- One fund lost 13.1% in a single quarter
- Another ended the year with 8.1% loss
- CEO sent public apology letters
- They froze new investments

But that wasn't even the worst part...

4. Existential Threat

The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.

The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.

- Banned a quant competitor from trading for 3 days
- Banned another from opening index futures for 12 months
- Required strategy disclosures before trading
- Threatened to increase trading costs 10x to destroy the industry

High-Flyer faced extinction.

(High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)

4. Rebirth in AI

In 2023, Instead of giving up, they pivoted. They spun out Deepseek, an AI lab fueled by their existing talent and 10k GPUs. No VC funding. They went all-in.

The twist? They kept their same hiring philosophy of hiring outsiders: new-grads who are brilliant, passionate and curious over experienced AI Researchers.

Liang: “There are no wizards. We are mostly fresh graduates from top universities, PhD candidates in their fourth or fifth year, and some young people who graduated just a few years ago”

5. Early Breakthroughs

Deepseek made waves in early 2024 with Deepseek v2, introducing:
- MLA (multi-latent attention) and Sparse MoE, cutting training costs by 42.5%.
- KV cache reductions of 93.3%.
- A 5.76x boost in max generation throughput.

By September, they released R1-lite-preview, the first competitor to OpenAI’s o1 reasoning model, using a novel RL technique leveraging test-time compute and beating everyone else (open or closed source) to market

6. Frontier Open Source Model

On Christmas, they shocked the AI world with Deepseek v3:
- Trained for just $6M but rivaled ChatGPT-4o and Claude 3.5 Sonnet.
- Introduced groundbreaking innovations like Multi-Token Prediction, FP8 Mixed Precision Training, Distilled Reasoning Capabilities from R1 and Auxiliary-loss-free Strategy for Load Balancing.
- API costs that are 20-50x cheaper than the competition:
- Deepseek: $0.14 / 1M in, $0.28 / 1M out
- OpenAI: $2.50 / 1M in, $10 / 1M out
- Anthropic: $3 / 1M in, $15 / 1M out.

7. Pushing the Frontier of AGI

This week, they were the first to release a fully open source reasoning model that matched OpenAI o1.

They shared their learnings publicly and revealed that they were able to train this model through pure Reinforcement Learning without needing Supervised Fine Tuning or Reward Modeling.

And the API costs are still 20-50x cheaper than the competition:
- DeepSeek R1: $0.14~$0.55 / 1M in, $2.19 / 1M out
- OpenAI o1: $7.50~$15 / 1M in, $60 / 1M out

7. The lesson?

Sometimes having less means innovating more. DeepSeek proves you don't need:

- Billions in funding
- Hundreds of PhDs
- A famous pedigree

Just brilliant young minds, the courage to think differently and the grit to never give up 💪

If you found this insightful, please follow and I look forward to sharing more AI resources and learnings in the future

Update: @zizhpan told me that's not the actual picture of the DeepSeek CEO. Apologies for the mixup as I took the picture from the previously latest publicly available interview.

Here's his actual picture (on the right).

The rest of the data in the thread about High-Flyer are still accurate.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll