Lennart Heim Profile picture
Mar 11 16 tweets 4 min read Read on X
Huawei's next AI accelerator—the Ascend 910C—is entering production. It's China's best AI chip.
Thanks to backdoor sourcing, we could easily see 1M H100-equiv this year.
Here’s what we know about its performance and strategic implications. Spoiler: selectively competitive. 1/ Image
The 910C is basically two co-packaged Ascend 910Bs, China's best current-gen accelerator. But there's a twist: most (potentially all) of these chips weren't produced domestically—they were illicitly procured from TSMC despite export controls. 2/ Image
I'd expect the 910C to achieve ~800 TFLOP/s at FP16 and ~3.2 TB/s memory bandwidth. This makes it only ≈80% as performant as NVIDIA's previous-generation H100 (from 2022) while using 60% more logic die area. 3/
Unlike NVIDIA's advanced packaging in the B100/200 series, the 910C likely uses a less technically sophisticated approach with two separate silicon interposers connected by an organic substrate. 4/
This could result in 10-20x less die-to-die bandwidth compared to NVIDIA's solution. This needs to be overcome by engineering. If the bandwidth is that low, it's not really one chip, and engineers using that chip need to take it into account. 5/
The technical gap is substantial: compared to NVIDIA's B200 that will go into data centers this year.
The 910C has ~3x less computational performance, ~2.5x less memory bandwidth (assuming HBM2E which they've stockpiled; HBM3 also possible), and a lot more power-inefficient. 6/
Huawei likely illicitly got close to 3M Ascend dies (7nm) from TSMC (now fixed via foundry due diligence rule).
They also stockpiled HBM2E memory from Samsung (also controlled but they stockpiled before): enough for potentially 1.4M 910C accelerators.
7/
In addition, @Gregory_C_Allen just shared some speculations on their own advanced production capacity. They should be able to produce 910B and 910C dies at the 7nm node.
8/ Image
But we've yet to see a teardown of a 910B or 910C actually produced domestically (I think it's possible but expect the majority to come illegally from TSMC).
9/
While impressive, this still falls short of what the West produces, with at least 5x the number of chips in 2025 and 10-20x the computing power. The US compute advantage in total remains strong. 10/
Having 10x more compute is cool and a key strategic advantage, as I've argued before. But it's different if it's disbursed across many companies. China can centralize more easily than we do... that's a key thing to watch out for. 11/
This means China will be competitive in many domains.
Expect competitive models and more gains especially from reasoning. However, the next pre-training generation might require new and bigger clusters needing tens of thousands of chips. 12/
Furthermore, to gain from those models, countries will want to deploy them to millions of users, or run a large number of AI agents autonomously, where total compute quantity still matters. That’s where we will see the impact of these controls. 13/
To summarize: Per-chip performance isn't impressive—achieving only 80% of the H100 with a 4 year delay. BUT, they can overcome it by clustering more chips given the substantial amount of illicit dies procured from TSMC (and potentially smaller amounts from SMIC). 14/
There will be competitive models from China—the talent and compute are there.
This doesn't mean export controls failed; it's just critical to understand what China can deliver, what export controls allow, and what they do not. 15/
I've shared before all the complementary approaches we need—AI resilience, AI for defense, and more. Will write this all up together at some point to pre-empt another DeepSeek-style freakout.
Thanks to @Huang_Sihao and others! 16/16

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lennart Heim

Lennart Heim Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ohlennart

Jan 14
In a new perspective, I explain and analyze the AI Diffusion Framework—what it does, how it works, its rationale, why it was needed, why China can't easily fill the void, and some thoughts on model weight controls.
1/ Image
Full paper here:
This table gives the best overview: The framework applies rules based on company HQ location and export destination—covering both advanced AI chips and certain model weights. 2/ rand.org/content/dam/ra…Image
Three country tiers determine access rights and security requirements: Countries in Tier 1 have no import restrictions. Tier 2 countries can receive exports only through authorized companies or obtained licenses, while Tier 3 countries face continued restrictions. 3/ Image
Read 16 tweets
Dec 2, 2024
Yearly export control update just dropped, restricting high-bandwidth memory (HBM). HBM is critical for advanced AI accelerators, especially for deployment workloads with long context windows.
The goal? Stop the PRC from equipping their AI accelerators with HBM. 1/ Image
Quick HBM primer: HBM is the most advanced high-performance memory. It’s made by stacking DRAM dies. Only SK Hynix, Micron, and Samsung currently produce it at scale. All current leading data center AI chips use HBM. 2/ Image
Nvidia H100/B100 run on HBM3/HBM3e, while China's Huawei Ascend 910B uses HBM2e. Here's the thing though—Huawei shouldn't have access to it. They're on the entity list. 3/
Read 15 tweets
Jan 2, 2024
Some personal musings about AI Governance and Policy until I run out ...
First, AI training compute is still doubling every 6 months.
2. I'd like to see more proposals that work under the assumption of "AI capabilities are gonna diffuse/proliferate anyway" - what now?
Read 34 tweets
Oct 17, 2023
The US just published its revised export controls on AI chips, moving away from the 'chip-to-chip' interconnect bandwidth threshold to a threshold on computational performance (OP/s), including its derived performance density (OP/s per mm²).
1/ Image
As I've highlighted before, there were loopholes in the initial controls. At first glance, these new measures seem to address those. The prior 'escape/scaling path' allowed continued scaling computational performance while bounding the interconnect.

2/
Image
A threshold on computational performance alone would eventually hit consumer chips, for example, future gaming GPUs. To mitigate this, they added a license exemption for "consumer-grade ICs". These are ICs "not designed or marketed for use in datacenters".
3/ Image
Read 8 tweets
Dec 27, 2022
Currently doing my yearly review. Such a fun and useful thing to spend your time on between the holidays. Can be done in a couple of hours (or up to days if you like), alone or with friends. Some tips + resources that I like:
🧵⬇️
First, pick what works for you. I use parts of all the resources linked below and created my own template. Roughly divided into (1) Personal review and planning, and (2) Career review and planning.
Also, I'm a huge fan of themes (seasonal though):
Second, revisit your Yearly Review. Sitting down for a day and setting a bunch of ambitious goals won't do it. Revisit it ideally as part of your monthly and weekly review.
Below my weekly and monthly review template:
- blog.heim.xyz/weekly-review/
- blog.heim.xyz/monthly-review/
Read 7 tweets
Oct 8, 2022
The 🇺🇸US just announced new tech export restrictions against China 🇨🇳. We're talking about billions of $ in trade.
It affects all types of integrated circuits (ICs) and semiconductor manufacturing equipment (SME). The motivation explicitly includes AI and supercomputing.
🧵⬇️
It includes Chips fabbed outside the US (looking at you Taiwan's TSMC).
- No 5-year old NVIDIA V100,
- Extended SME ban: anything below 16nm,
- Not more than 600GB/s of bandwidth for ICs.
- ...
This will put China years behind the cutting edge.
You might have heard about the Chinese AI hardware company Biden (). Lots of ex-NVIDIA engineers and at least according to specifications pretty impressive performance
Well, they're fabless and used TSMC for the production. That's now also restricted.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(