Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Lennart Heim

@ohlennart

Mar 11 • 16 tweets • 4 min read • Read on X

Huawei's next AI accelerator—the Ascend 910C—is entering production. It's China's best AI chip.
Thanks to backdoor sourcing, we could easily see 1M H100-equiv this year.
Here’s what we know about its performance and strategic implications. Spoiler: selectively competitive. 1/

The 910C is basically two co-packaged Ascend 910Bs, China's best current-gen accelerator. But there's a twist: most (potentially all) of these chips weren't produced domestically—they were illicitly procured from TSMC despite export controls. 2/

I'd expect the 910C to achieve ~800 TFLOP/s at FP16 and ~3.2 TB/s memory bandwidth. This makes it only ≈80% as performant as NVIDIA's previous-generation H100 (from 2022) while using 60% more logic die area. 3/

Unlike NVIDIA's advanced packaging in the B100/200 series, the 910C likely uses a less technically sophisticated approach with two separate silicon interposers connected by an organic substrate. 4/

This could result in 10-20x less die-to-die bandwidth compared to NVIDIA's solution. This needs to be overcome by engineering. If the bandwidth is that low, it's not really one chip, and engineers using that chip need to take it into account. 5/

The technical gap is substantial: compared to NVIDIA's B200 that will go into data centers this year.
The 910C has ~3x less computational performance, ~2.5x less memory bandwidth (assuming HBM2E which they've stockpiled; HBM3 also possible), and a lot more power-inefficient. 6/

https://x.com/Huang_Sihao/status/1879605419385225725

Huawei likely illicitly got close to 3M Ascend dies (7nm) from TSMC (now fixed via foundry due diligence rule).
They also stockpiled HBM2E memory from Samsung (also controlled but they stockpiled before): enough for potentially 1.4M 910C accelerators.
7/

https://x.com/Huang_Sihao/status/1879605419385225725

https://x.com/Gregory_C_Allen/status/1898040379611504983

In addition, @Gregory_C_Allen just shared some speculations on their own advanced production capacity. They should be able to produce 910B and 910C dies at the 7nm node.
8/

https://x.com/Gregory_C_Allen/status/1898040379611504983

But we've yet to see a teardown of a 910B or 910C actually produced domestically (I think it's possible but expect the majority to come illegally from TSMC).
9/

While impressive, this still falls short of what the West produces, with at least 5x the number of chips in 2025 and 10-20x the computing power. The US compute advantage in total remains strong. 10/

Having 10x more compute is cool and a key strategic advantage, as I've argued before. But it's different if it's disbursed across many companies. China can centralize more easily than we do... that's a key thing to watch out for. 11/

This means China will be competitive in many domains.
Expect competitive models and more gains especially from reasoning. However, the next pre-training generation might require new and bigger clusters needing tens of thousands of chips. 12/

Furthermore, to gain from those models, countries will want to deploy them to millions of users, or run a large number of AI agents autonomously, where total compute quantity still matters. That’s where we will see the impact of these controls. 13/

To summarize: Per-chip performance isn't impressive—achieving only 80% of the H100 with a 4 year delay. BUT, they can overcome it by clustering more chips given the substantial amount of illicit dies procured from TSMC (and potentially smaller amounts from SMIC). 14/

There will be competitive models from China—the talent and compute are there.
This doesn't mean export controls failed; it's just critical to understand what China can deliver, what export controls allow, and what they do not. 15/

I've shared before all the complementary approaches we need—AI resilience, AI for defense, and more. Will write this all up together at some point to pre-empt another DeepSeek-style freakout.
Thanks to @Huang_Sihao and others! 16/16

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ohlennart

Lennart Heim

@ohlennart

Jan 14

In a new perspective, I explain and analyze the AI Diffusion Framework—what it does, how it works, its rationale, why it was needed, why China can't easily fill the void, and some thoughts on model weight controls.
1/

Full paper here:
This table gives the best overview: The framework applies rules based on company HQ location and export destination—covering both advanced AI chips and certain model weights. 2/ rand.org/content/dam/ra…

Three country tiers determine access rights and security requirements: Countries in Tier 1 have no import restrictions. Tier 2 countries can receive exports only through authorized companies or obtained licenses, while Tier 3 countries face continued restrictions. 3/

Read 16 tweets

Lennart Heim

@ohlennart

Dec 2, 2024

Yearly export control update just dropped, restricting high-bandwidth memory (HBM). HBM is critical for advanced AI accelerators, especially for deployment workloads with long context windows.
The goal? Stop the PRC from equipping their AI accelerators with HBM. 1/

Quick HBM primer: HBM is the most advanced high-performance memory. It’s made by stacking DRAM dies. Only SK Hynix, Micron, and Samsung currently produce it at scale. All current leading data center AI chips use HBM. 2/

Nvidia H100/B100 run on HBM3/HBM3e, while China's Huawei Ascend 910B uses HBM2e. Here's the thing though—Huawei shouldn't have access to it. They're on the entity list. 3/

Read 15 tweets

Lennart Heim

@ohlennart

Jan 2, 2024

Some personal musings about AI Governance and Policy until I run out ...

First, AI training compute is still doubling every 6 months.

2. I'd like to see more proposals that work under the assumption of "AI capabilities are gonna diffuse/proliferate anyway" - what now?

Read 34 tweets

Lennart Heim

@ohlennart

Oct 17, 2023

The US just published its revised export controls on AI chips, moving away from the 'chip-to-chip' interconnect bandwidth threshold to a threshold on computational performance (OP/s), including its derived performance density (OP/s per mm²).
1/

https://twitter.com/ohlennart/status/1590413174788227072

As I've highlighted before, there were loopholes in the initial controls. At first glance, these new measures seem to address those. The prior 'escape/scaling path' allowed continued scaling computational performance while bounding the interconnect.

2/

https://twitter.com/ohlennart/status/1590413174788227072

A threshold on computational performance alone would eventually hit consumer chips, for example, future gaming GPUs. To mitigate this, they added a license exemption for "consumer-grade ICs". These are ICs "not designed or marketed for use in datacenters".
3/

Read 8 tweets

Lennart Heim

@ohlennart

Dec 27, 2022

Currently doing my yearly review. Such a fun and useful thing to spend your time on between the holidays. Can be done in a couple of hours (or up to days if you like), alone or with friends. Some tips + resources that I like:
🧵⬇️

First, pick what works for you. I use parts of all the resources linked below and created my own template. Roughly divided into (1) Personal review and planning, and (2) Career review and planning.
Also, I'm a huge fan of themes (seasonal though):

Second, revisit your Yearly Review. Sitting down for a day and setting a bunch of ambitious goals won't do it. Revisit it ideally as part of your monthly and weekly review.
Below my weekly and monthly review template:
- blog.heim.xyz/weekly-review/
- blog.heim.xyz/monthly-review/

Read 7 tweets

Lennart Heim

@ohlennart

Oct 8, 2022

The 🇺🇸US just announced new tech export restrictions against China 🇨🇳. We're talking about billions of $ in trade.
It affects all types of integrated circuits (ICs) and semiconductor manufacturing equipment (SME). The motivation explicitly includes AI and supercomputing.
🧵⬇️

It includes Chips fabbed outside the US (looking at you Taiwan's TSMC).
- No 5-year old NVIDIA V100,
- Extended SME ban: anything below 16nm,
- Not more than 600GB/s of bandwidth for ICs.
- ...
This will put China years behind the cutting edge.

https://twitter.com/ohlennart/status/1557088013599006721

You might have heard about the Chinese AI hardware company Biden (

https://twitter.com/ohlennart/status/1557088013599006721

). Lots of ex-NVIDIA engineers and at least according to specifications pretty impressive performance
Well, they're fabless and used TSMC for the production. That's now also restricted.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Lennart Heim

Try unrolling a thread yourself!

More from @ohlennart

Lennart Heim

Lennart Heim

Lennart Heim

Lennart Heim

Lennart Heim

Lennart Heim

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!