Post

More from @SemiAnalysis_

SemiAnalysis

@SemiAnalysis_

Jul 18

Similar to DeepSeek in January 2025, Panicans may think that the AI networking switch TAM will massively shrink because Kimi K3 uses KDA Attention, which reduces KV-transfer networking bandwidth by up to 10x. But the opposite is true, as we explain below. 👇️ 1/8🧵

While it is true that Kimi K3 uses Kimi Delta Linear Attention (KDA) in 3 out of every 4 layers and that KDA reduces KV-cache transfer bandwidth by up to 10x compared with comparable full global-attention models, the important missing piece is that Kimi K3 requires WideEP to serve. 2/8🧵

Because Kimi K3 has 2.8 trillion parameters, even at MXFP4, each forward pass will require 1.5 TB of HBM bandwidth. This means that, even with spec decode, serving it profitably at a reasonable level of interactivity requires aggregating many chips together over a high-bandwidth network, such as the GB300 NVL72. 3/8🧵

Read 8 tweets

SemiAnalysis

@SemiAnalysis_

Jul 18

MASSIVE DELAY ALERT TO ORACLE’S STARGATE SITE AND BLOOM ENERGY🚨🚨

Oracle’s Project Jupiter behind-the-meter datacenter project in New Mexico that plans to use Bloom Energy is at risk of a 1-2 year delay due to permitting and pipeline building blockers. (1/8)🧵

As we continue to monitor the status of datacenter delays, whether they are real, whether they are fake... some are out and out delayed because of -> building gas pipelines and receiving permits for power generation equipment. (2/8)

Oracle's proposed 2.45 GW Project Jupiter site in New Mexico can't run at any meaningful capacity until a 17 mile pipeline (the Green Chile Pipeline) connecting the El Paso Natural Gas (EPNG) system to a delivery meter station is constructed. Oracle switched from turbines to Bloom Energy fuel cells earlier this year — but fuel cells run on pipeline gas too. All details below👇️ (3/8)

Read 8 tweets

SemiAnalysis

@SemiAnalysis_

Jul 17

Similar to the panic over DeepSeek R1, some uneducated people think Kimi K3’s use of linear attention (KDA) is bad for NVIDIA, HBM, DRAM, and networking because it has relatively lower KV-cache requirements. The opposite is true, and we explain why below. 👇️ 1/8🧵

Kimi K3 is actually quite positive for NVIDIA, as large-model inference is where the NVL72 shines. Because K3 has more than 2.8 trillion parameters, it requires a large scale-up domain to store its weights. 2/8🧵

Secondly, although Kimi Delta Attention has up to 10× lower networking requirements for KV-cache transfers, its large weights require even more network bandwidth to implement an optimization called WideEP, which spreads the weights across different GPUs. 3/8🧵

Read 8 tweets

SemiAnalysis

@SemiAnalysis_

Jul 13

On Feb 28th 2026, the United States launched Operation “Epic Fury” against Iran.

Markets and media were caught off guard, but careful and diligent observers were hardly surprised.

The signals sat in plain sight for weeks, you just had to know where to look. A thread on OSINT — and why we run some of our research the same way. (1/10)🧵

Before the first strike ever came down on Tehran, OSINT accounts right here on this platform were tracking the military buildup in the Middle East in near real time.

Here for example one could see how leading up to the 28th, 333 (at a minimum) C-5 and C-17 transport aircraft flights were recorded leaving US bases towards the Middle East theater. (2/10)

x.com/ArmchairAdml/s…

https://x.com/EGYOSINT/status/2026913804550885509

Additionally, here one could see how U.S. F-22 Raptor stealth fighters were amassing at the Israeli Uvda Air Base, with Patriot missile components being set up on site, and new tarmacs being constructed. (3/10)

https://x.com/EGYOSINT/status/2026913804550885509

Read 10 tweets

SemiAnalysis

@SemiAnalysis_

Jul 7

TSMC’s moat is bigger than PPA, EUV, or yield. It is the EDA/IP ecosystem wrapped around the fab. (1/8)🧵

TSMC’s Open Innovation Platform has turned Synopsys, Cadence, Arm, Rambus, Alphawave, and dozens of IP vendors into a pre-validated tape-out network. And that moat is measurable. (2/8)

TSMC’s certified Silicon IP library grew from 3K items in 2010 to 93K in 2025. 31x growth. (3/8)

Read 8 tweets

SemiAnalysis

@SemiAnalysis_

Jul 6

With the recent surge in AI mega clusters, reaching hundreds of thousands to millions of AI accelerators, cloud providers encountered a new set of challenges that forced them to run chip interconnect at a new scale: interconnecting multiple datacenters together. This is called scale-across. (1/7)🧵

Scale-across has been widely popularized by Nvidia last year, but is now often used imprecisely as a generic term to talk almost about any Datacenter interconnect network. To make it short, scale-across refers to backend datacenter interconnect networks that are used to connect multiple datacenters all together to form a single, coherent cluster. (2/7)

The main confusion comes from the fact that scale across can use the same network equipment as for traditional datacenter interconnect. Indeed, as interconnections often reach a few kilometers minimum, operators can use ZR/ZR+ coherent pluggables and Optical Line Systems, which can use amplifiers and passive (AWG) or active (WSS) multiplexers. (3/7)

Read 7 tweets

Share this page!

Enter URL or ID to Unroll

SemiAnalysis

Try unrolling a thread yourself!

More from @SemiAnalysis_

SemiAnalysis

SemiAnalysis

SemiAnalysis

SemiAnalysis

SemiAnalysis

SemiAnalysis

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!