Henry Shi Profile picture
Jan 21 11 tweets 5 min read Read on X
DeepSeek just released the first Open Source Reasoning Model that matched o1!

But how did an unknown, 100 person startup with $0 VC funding produce a frontier open source model that rivaled OpenAI and Anthropic at 1/10th of the training cost and is 20-50x cheaper during inference?

After doing extensive research into the company's history, here’s the untold founding story of the rise, fall and rebirth behind DeepSeek and it’s parent company High-Flyer 🧵Image
Image
1. Humble beginnings

In 2007, three engineers Xu Jin, Zheng Dawei, and Liang Wenfeng (CEO) met at Zhejiang University and bonded over algorithmic trading .

Their idea? Build a quant fund powered by cutting-edge AI. But instead of hiring industry veterans, they prioritized raw talent and curiosity over experience. Liang: “Core technical roles are primarily filled by recent grads or those 1–2 years out.”
2. Quiet Ascent

The team worked quietly on various algorithmic trading ideas for 8 years before founding High-Flyer in 2015.

Their culture of hiring and innovation worked extremely well. By 2021, they were crushing it:
- Invested $140M and built a massive AI trading platform
- Owned 10,000 NVIDIA A100 GPUs
- Became a top 4 quant fund with $15B AUM

Then it all came crashing down...
3. Turning of Tides

2022 was a nightmare. High-Flyer’s success caught up with them.

They grew too big, too fast and started to lose billions.
- One fund lost 13.1% in a single quarter
- Another ended the year with 8.1% loss
- CEO sent public apology letters
- They froze new investments

But that wasn't even the worst part...
4. Existential Threat

The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.

The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.

- Banned a quant competitor from trading for 3 days
- Banned another from opening index futures for 12 months
- Required strategy disclosures before trading
- Threatened to increase trading costs 10x to destroy the industry

High-Flyer faced extinction.

(High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)Image
4. Rebirth in AI

In 2023, Instead of giving up, they pivoted. They spun out Deepseek, an AI lab fueled by their existing talent and 10k GPUs. No VC funding. They went all-in.

The twist? They kept their same hiring philosophy of hiring outsiders: new-grads who are brilliant, passionate and curious over experienced AI Researchers.

Liang: “There are no wizards. We are mostly fresh graduates from top universities, PhD candidates in their fourth or fifth year, and some young people who graduated just a few years ago”
5. Early Breakthroughs

Deepseek made waves in early 2024 with Deepseek v2, introducing:
- MLA (multi-latent attention) and Sparse MoE, cutting training costs by 42.5%.
- KV cache reductions of 93.3%.
- A 5.76x boost in max generation throughput.

By September, they released R1-lite-preview, the first competitor to OpenAI’s o1 reasoning model, using a novel RL technique leveraging test-time compute and beating everyone else (open or closed source) to marketImage
Image
6. Frontier Open Source Model

On Christmas, they shocked the AI world with Deepseek v3:
- Trained for just $6M but rivaled ChatGPT-4o and Claude 3.5 Sonnet.
- Introduced groundbreaking innovations like Multi-Token Prediction, FP8 Mixed Precision Training, Distilled Reasoning Capabilities from R1 and Auxiliary-loss-free Strategy for Load Balancing.
- API costs that are 20-50x cheaper than the competition:
- Deepseek: $0.14 / 1M in, $0.28 / 1M out
- OpenAI: $2.50 / 1M in, $10 / 1M out
- Anthropic: $3 / 1M in, $15 / 1M out.Image
7. Pushing the Frontier of AGI

This week, they were the first to release a fully open source reasoning model that matched OpenAI o1.

They shared their learnings publicly and revealed that they were able to train this model through pure Reinforcement Learning without needing Supervised Fine Tuning or Reward Modeling.

And the API costs are still 20-50x cheaper than the competition:
- DeepSeek R1: $0.14~$0.55 / 1M in, $2.19 / 1M out
- OpenAI o1: $7.50~$15 / 1M in, $60 / 1M outImage
7. The lesson?

Sometimes having less means innovating more. DeepSeek proves you don't need:

- Billions in funding
- Hundreds of PhDs
- A famous pedigree

Just brilliant young minds, the courage to think differently and the grit to never give up 💪

If you found this insightful, please follow and I look forward to sharing more AI resources and learnings in the future
Update: @zizhpan told me that's not the actual picture of the DeepSeek CEO. Apologies for the mixup as I took the picture from the previously latest publicly available interview.

Here's his actual picture (on the right).

The rest of the data in the thread about High-Flyer are still accurate.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Henry Shi

Henry Shi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @henrythe9ths

Dec 30, 2024
In just 2 days, Deepseek v3 became the #1 programming model on @OpenRouterAI, beating Claude 3.5 Sonnet, GPT-4o and capturing over ⅓ of all tokens.

But how did an unknown, 100 person startup with $0 VC funding produce a frontier open source model that rivaled OpenAI and Anthropic at 1/10th of the cost?

Here’s the untold founding story of the rise, fall and rebirth behind @deepseek_ai and it’s parent company High-Flyer 🧵Image
1. Humble beginnings

In 2007, three engineers Xu Jin, Zheng Dawei, and Liang Wenfeng (CEO) met at Zhejiang University and bonded over algorithmic trading .

Their idea? Build a quant fund powered by cutting-edge AI. But instead of hiring industry veterans, they prioritized raw talent and curiosity over experience.

Liang: “Core technical roles are primarily filled by recent grads or those 1–2 years out.”Image
2. Quiet Ascent
The team worked quietly on various algorithmic trading ideas for 8 years before founding High-Flyer in 2015.

Their culture of hiring and innovation worked extremely well. By 2021, they were crushing it:
- Invested $140M and built a massive AI trading platform
- Owned 10,000 NVIDIA A100 GPUs
- Became a top 4 quant fund with $15B AUM

Then it all came crashing down...
Read 9 tweets
Jan 29, 2023
We at Super successfully transitioned to a fully remote company, hired 100+ people, and raised $100MM+ in a little over 1 year

Here’s how 🧵
Scaling up creates information asymmetry and goes against an open and transparent culture.

This can be reversed if we operationalize the flow of information - which is best achieved by implementing a culture of writing and reading
Why written culture?

1. Writing down your thoughts = Better understanding of your line of thinking
2. It gives everyone in the room a voice
3. People have more time to process their thoughts
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(