SemiAnalysis Profile picture
Jul 1 4 tweets 2 min read
Google's next TPU, codenamed Humufish, is set to use Intel's EMIB-T instead of TSMC CoWoS.

Nearly every leading AI training accelerator today is packaged on a TSMC 2.5D flow, and almost all of it is CoWoS. CoWoS is the industry default, which is exactly why a flagship part moving off it is worth attention.

The core difference. CoWoS places all dies on a single large silicon/RDL interposer. EMIB embeds small silicon bridges directly in the organic substrate, only where die-to-die links are needed. (1/4)🧵Image So why EMIB?

🟠 EMIB isn't bound by the interposer reticle limit. A CoWoS silicon interposer is printed by lithography, so it is capped by the reticle limit; the monolithic version (CoWoS-S) maxed near 3.3x, which is why TSMC moved to CoWoS-L. EMIB is not bound by the reticle limit, so it’s a much more scalable technology.

🟠 Efficiency and cost. EMIB packaging is meaningfully cheaper, since it drops the costly interposer entirely. EMIB also uses silicon far more efficiently than CoWoS. A wafer is round, so large interposers waste area at the edge and yield worse as they grow, while tiny bridges tile densely with little waste. It also gives buyers a second source outside TSMC. (2/4)Image
Jun 29 4 tweets 1 min read
INTERESTING: Only 3 months after Rubin Ultra was announced at GTC 2026, the original 4-die Rubin Ultra has been cancelled due to manufacturing execution concerns. The new “Rubin Ultra” is half the size/~ half the real-world performance of the original Rubin Ultra. 1/4🧵 Image This all comes against the backdrop of NVIDIA’s market share being eroded by Trainium, TPUs, and AMD chips. For NVIDIA to maintain pole position, it must be aggressive in execution. Manufacturing execution issues like this will only lead to more market share being chipped away. 2/4🧵
Jun 29 6 tweets 3 min read
One of the most underappreciated ways to play the AI semiconductor buildout may be through materials rather than chips themselves.

As the industry races to produce more advanced semiconductors, demand isn’t just rising for GPUs and wafer fab equipment, it’s rising for the critical materials that make modern chips possible. (1/6)🧵Image Tungsten is a great example.

It is one of the most critical materials in semiconductor fabrication, prized for its high-temperature stability and resistance to electrical wear. Fabs rely on CVD to fill the deep, high-aspect-ratio vertical vias that link multi-layered chip architectures, while utilizing PVD to deposit the ultra-thin structural barrier layers surrounding them. Because it spans both core deposition categories, tungsten is completely non-negotiable for advanced chip production. (2/6)Image
Jun 28 7 tweets 2 min read
BREAKING NEWS: The Founder/CEO of LeptonAI has left only a year after LeptonAI’s acquisition. This is quite shocking, as Jensen reportedly spent $700M acquiring LeptonAI. What did he see? DGX Lepton flopped and got nowhere near the success Jensen expected. 1/7🧵 Image Initially, NVIDIA claimed that Lepton’s core software platform would be open-sourced by 2026. That has yet to happen. While we were skeptical, we wanted to believe that NVIDIA would open-source the core Lepton software platform, given that Lepton’s CEO is the co-creator of Caffe, ONNX, and PyTorch. 2/7🧵
Jun 27 4 tweets 2 min read
One of the more uncomfortable observations in our AI Value Capture piece is internal: our token spend at SemiAnalysis now runs at roughly 30% of employee compensation, with employees pulling just under 5 billion tokens per month on average, over 5x more than Meta, and our top contributors clearing 100 billion. We wrote about it openly because every research firm, hedge fund, and law firm we know is heading toward a similar number, just on a delay. (1/4)🧵Image The substitution math is the part to internalize. Tasks that used to need a junior analyst for several hours, converting a model to a dashboard, building chart packs from earnings, rebuilding a comp set, now resolve in minutes for a few dollars of tokens. The blended Opus 4.7 cost we observe is about $0.99 per million against $5/$25 sticker, mostly because agentic workloads run 300:1 input-to-output ratios and cache hit rates above 90% pull the effective price down. Thats a real change in the unit economics of professional services, not a 10% efficiency gain. (2/4)
Jun 26 5 tweets 2 min read
H100 ornn index spot prices are falling, now at $2.42 per hour, roughly 40% below the May peak. The ecosystem is concerned that this is a sign that compute demand and by extension the appetite for AI is waning. (1/5)🧵 Image The important signal is that this is likely a spot price index not term pricing. Our neocloud survey for 1-year H100 contract prices have isntead climbed from a trough of roughly $1.70 per hour late last year to about $2.65 per hour today. (2/5) Image
Jun 22 4 tweets 1 min read
AI demand is outstripping Moore's law in the short run
Moore's law drove import prices of computers and semiconductors down by 52% between 2001 and 2020. (1/4)🧵 Image AI demand has surged so high that import prices for computers and semiconductors rose 3.6% in May, now up 14.4% year-to-year. This is so far from anything in the historical record that 'fastest ever' doesn't do justice to it. (2/4)
Jun 15 10 tweets 4 min read
China is Mogging Western Auto, and that’s Bad for Semis, National Security & War

If you live anywhere outside the US, you've noticed it: the streets are filling up with cars you've never seen before. Chery? Jaecoo? Zeekr? Leapmotor? BYD? No, you didn't miss a decade of car launches. They're Chinese. And they're everywhere. (1/10)🧵Image Israel is the perfect case study to understand what’s really going on: high car ownership, zero domestic production & no restrictions on auto imports from China. Here’s what the data shows:

China's share of Israel's auto import value: 2023: 23.7% 2024: 29.1% 2025: 36.6% 2026 YTD: 40.2. (2/10)Image
Jun 10 4 tweets 2 min read
What's the better business model for an AI lab, subscription or API? (1/4)🧵 Image Recently, we purchased one of each Anthropic/OpenAI subscription plan and randomly ran long horizon coding tasks until we exhausted the weekly limit. It's widely believed that a $200/month plan maxes out at ~$2000/month worth of tokens (assuming API pricing). However, we found that the subscriptions are actually far more generous. (2/4)Image
May 13 10 tweets 2 min read
OSATs are usually seen as “boring” semiconductor companies. But we’ve been, and remain, bullish on Amkor ($AMKR) and ASE ($3711.TW). Why?
Because both sides of the OSAT model, Assembly & Test, are starting to shift in a meaningful way. (1/10) 🧵 Image Let’s focus on the first for now; Assembly.

Historically, packaging = low-margin wire bonding. Not exciting.
ASE once made up ~40% of $KLIC’s wire bonder business. After the COVID boom, capacity flooded the market and growth stalled. (2/10)
May 12 11 tweets 3 min read
Something to watch closely as the war in Iran drags on:

A very obscure part of the semiconductor supply chain, Naphtha, is potentially
becoming a quiet constraint on AI chips. (1/11) 🧵 Image Here is what the supply chain looks like:

An oil called Naphtha is shipped on giant tankers from Middle Eastern countries like Kuwait, UAE and Saudi Arabia by companies like Aramco (Saudi Arabia) or
ADNOC (UAE).

Then Integrated Japanese Chemical companies like Daicel and Toagosei and Korean petrochemical giants like LG Chem or Lotte Chemical "crack" it (break it down) in massive factories to create Propylene gas. (2/11)
May 6 10 tweets 2 min read
Earlier this year, Micron announced it would acquire PSMC’s P5 Tongluo fab in Miaoli, Taiwan—the process has officially begun.

At first glance, this looked like a straightforward legacy logic/memory fab acquisition. But the details worth a close look. (1/10) 🧵 Image The site has two key sections: Section A and Section B.

Section A already exists and is now being converted for likely Micron’s 1b DRAM process. Because it is not EUV-compatible, 1b is a practical fit for the existing cleanroom setup with proper equipment from Micron in coming quarters (the existing legacy equipment are from PSMC and those were not including in the acquisition agreement). (2/10)
Apr 10 5 tweets 2 min read
Jensen showing Rubin Ultra as an MCM was the real tell. This is not just Nvidia gluing more dies together because it feels like the next cool architecture move. It is what happens when reticle limits, power density, yield, and package economics all start forcing the same answer. (1/5)🧵Image For people outside packaging, here is the intuitive way to think about it. Chips do not move through OSAT as loose magical objects. They move through a very physical factory flow with trays, boats, carriers, sockets, test handlers, ovens, lid attach, mark, inspection, and shipping constraints. There are standard footprints for how packaged parts are handled. A common JEDEC tray is roughly 12.7 by 5.35 inches (322.6 × 135.9 cm) externally, and once your package starts eating too much of that real estate, everything gets uglier. Fewer units per tray. Worse mechanical margin. Harder handling. More custom tooling. More risk in test and burn-in. Higher cost everywhere. (2/5)
Apr 3 4 tweets 2 min read
Memory is taking over Hyperscaler CapEx.

In CY23 and CY24, memory was ~8% of total Hyperscaler spend. We estimate it hits 30% in CY26 and moves higher in CY27. That's a near-4x shift in just four years. (1/4) 🧵 Image What's driving it:
🟠 DRAM prices are expected to more than double in CY26, with another double-digit ASP increase in CY27
🟠 LPDDR5 contract pricing up over 3x since 1Q25. Price likely exceeds $10/GB in 1Q26 on the open market
🟠 HBM remains structurally undersupplied through CY27. AI-based servers already see significant % BOM costs from HBM, before price hikes
🟠 We know B200 server prices are going up 15–20% by year-end

Memory is a massive % of the $250B in incremental hyperscaler spend this calendar year. (2/4)
Mar 26 5 tweets 1 min read
CPO (Co-Packaged Optics) testing is critical because the cost of failure is enormous. Once an optical engine is attached to the switch package, a defect can jeopardize the entire assembly rather than a replaceable module. The CPO test flow follows four sequential phases, each with distinct technical challenges, equipment requirements, and vendor ecosystems. (1/5)Image Phase 1 — Wafer-Level Single-Die Test: The EIC and PIC are tested separately at the wafer level before bonding. The EIC is standard CMOS wafer sort (no new equipment). The PIC requires novel double-sided electro-optical probing — this is where all the new equipment spend goes. (2/5)
Feb 26 9 tweets 4 min read
Everybody knows Alexander Fleming's accidental discovery of penicillin. A forgotten petri dish, a surprising mold contamination, and suddenly – life-saving modern medicine.

Well, the semiconductor industry actually has its own Alexander Fleming and "penicillin moment", and it happened while a brief mix-up occurred between an inkwell and a molten crucible.

Meet Jan Czochralski. (1/8)🧵Image It's 1916. Czochralski, a Polish chemist, is focused on understanding how metals crystallize. Labs back then were often cluttered, intense places with random objects and inventions scattered around.

One day, while writing notes in his Berlin lab, he goes to dip his pen in an inkwell and misses...

Instead of ink, his pen plunges into a crucible of molten tin sitting nearby. He slowly pulls it out and something bizarre happens.
A thin, shimmering filament of solidified metal emerges, perfectly straight and uniform. (2/8)Image
Feb 25 10 tweets 3 min read
Micron’s $100B megafab in NY is at risk of delay due to just 6 “concerned citizens” and their frivolous lawsuit. (1/10) 🧵 Image The project has already taken an absurd 1200 days between announcement and groundbreaking. Competitors overseas who began at the same time now have built and working fabs. (2/10) Image
Feb 22 4 tweets 2 min read
Occasionally we hear pitches for demand response for AI load, in which AI compute clusters reduce their workloads during "peak times" for the electric grid, reducing strain and requiring fewer new generators.

These programs are not new: typically they're tied to an electric bill credit. But to an AI cloud, it's...not worth the money. (1/4)🧵Image Let's use rule-of-thumb figures: AI clusters make $12M/MW annually in revenue, and have an EBIT margin of around 20%.

At those numbers, a utility needs to incentivize the cluster at $0.35/kWh - 5-10x the price of electricity for a datacenter - for demand response to be worth the loss in revenue.

In capacity market terms, that's more than $6,500/MW-day, or 20x the price cap for PJM's most recent capacity auction. (2/4)
Feb 10 9 tweets 5 min read
Agent swarms are moving from prompting tricks to math.

Kimi K2.5 trains an orchestrator that spawns + schedules specialist sub-agents in parallel, and reporting 3×–4.5× lower wall-clock on WideSearch, plus higher scores.

Anthropic also recently released agent teams, where multiple Claude Code instances work together. It is still experimental but has been used to write Claude's C compiler. (1/9) 🧵Image We ran a small benchmark with WideSearch on Claude Teams: 30 unique English tasks, 2 trials each, 30-min timeout, GPT-5.2 judge. We compare it to a Opus 4.6 baseline.

Teams were more expensive and slower on average: Total $93 for baseline vs Total $131 for teams.

We got 46/60 baseline vs 47/60 teams completed runs due to our 30-min cutoff (WideSearch tasks can run for 3+ hours with 100+ sources) (2/9)Image
Image
Image
Jan 31 6 tweets 3 min read
Backend networking architecture is instrumental to networking ownership cost analysis. How does networking architecture matter for networking ownership costs? In general, we observe the following trends:

🟠Hyperscalers typically get preferential terms on networking equipment compared to neoclouds;

🟠Infiniband-based networks are more expensive than Ethernet-based networks for the same networking type and cluster size;

🟠3-Layer networks would be more expensive than 2-Layer networks on a per rack server (1 rack server = 72 GPUs in the case of GB200 and GB300) basis; and

🟠All else constant, G300 networks running on CX-8 NICs are expected to be more expensive than GB200 networks running on CX-7 NICs on a per rack server basis because of the doubling of per GPU bandwidth.
(1/6)🧵 Before we dive into several examples of how architecture influences networking costs, we define the following terminology used to describe networking clusters:

🟠Network layers: Number of switch layers required to connect all GPUs within the same cluster, typically 2 or 3 layers

🟠Rails: Number of pathways you can split a server tray, which is equal to the GPUs per server tray or 4 in the case of GB200 or GB300 deployments

🟠Planes: Number of pathways you can split a NIC

🟠Attach Rate: GPU per Networking Component (or the reverse)
(2/6)
Jan 29 9 tweets 5 min read
Ever wonder how an Nvidia GB200 NVL72 gets from the factory to the data center? You can’t just throw a state-of-the-art 3,000 lb rack (compared to ~500 lb for CPU-based server) on a standard FedEx truck. Moving AI infrastructure is more like transporting a heart for transplant than shipping electronics. (1/9) 🧵Image A fully loaded AI rack can weigh up to 3,700 lbs. That’s the weight of a Ford Explorer concentrated into a 2x4 foot footprint. Standard pallets would crush instantly. These AI racks require custom-engineered reinforced bases with shock-absorbent foam just to keep them from tearing through the floor. (2/9)Image