What's the better business model for an AI lab, subscription or API? (1/4)🧵
Recently, we purchased one of each Anthropic/OpenAI subscription plan and randomly ran long horizon coding tasks until we exhausted the weekly limit. It's widely believed that a $200/month plan maxes out at ~$2000/month worth of tokens (assuming API pricing). However, we found that the subscriptions are actually far more generous. (2/4)
The margin on a subscription plan is a function of the average utilization. If we assume both companies have 75% API gross margins, this results in the following subscription margins. (3/4)
Obviously this is way worse than API overall. However, explicitly nerfing subscriptions leads to huge public backlash, and the rapidly falling cost of intelligence means you'll be able to profitably serve Opus 4.8 level models for $20/month in the near future. We therefore think it's far more likely the labs will withhold new features/models from subscription plans. It will be interesting to see if Mythos ends up being API only. (4/4)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Google's next TPU, codenamed Humufish, is set to use Intel's EMIB-T instead of TSMC CoWoS.
Nearly every leading AI training accelerator today is packaged on a TSMC 2.5D flow, and almost all of it is CoWoS. CoWoS is the industry default, which is exactly why a flagship part moving off it is worth attention.
The core difference. CoWoS places all dies on a single large silicon/RDL interposer. EMIB embeds small silicon bridges directly in the organic substrate, only where die-to-die links are needed. (1/4)🧵
So why EMIB?
🟠 EMIB isn't bound by the interposer reticle limit. A CoWoS silicon interposer is printed by lithography, so it is capped by the reticle limit; the monolithic version (CoWoS-S) maxed near 3.3x, which is why TSMC moved to CoWoS-L. EMIB is not bound by the reticle limit, so it’s a much more scalable technology.
🟠 Efficiency and cost. EMIB packaging is meaningfully cheaper, since it drops the costly interposer entirely. EMIB also uses silicon far more efficiently than CoWoS. A wafer is round, so large interposers waste area at the edge and yield worse as they grow, while tiny bridges tile densely with little waste. It also gives buyers a second source outside TSMC. (2/4)
Humufish is using EMIB-T. The "T" is TSV. Plain EMIB has no vias in the bridge, so power has to detour around it through the substrate, which strains power delivery. EMIB-T sends power vertically straight through the bridge, with added capacitors and a ground plane for cleaner power. That is what makes it ready for next-gen HBM and higher-bandwidth interconnects. (3/4)
INTERESTING: Only 3 months after Rubin Ultra was announced at GTC 2026, the original 4-die Rubin Ultra has been cancelled due to manufacturing execution concerns. The new “Rubin Ultra” is half the size/~ half the real-world performance of the original Rubin Ultra. 1/4🧵
This all comes against the backdrop of NVIDIA’s market share being eroded by Trainium, TPUs, and AMD chips. For NVIDIA to maintain pole position, it must be aggressive in execution. Manufacturing execution issues like this will only lead to more market share being chipped away. 2/4🧵
A good chunk of inference for the most successful AI agent, Claude Code, is done on Trainium, while Claude training is done on TPUs. Just a year ago, it would have been unimaginable that TPUs and Trainium could grow this rapidly, while the CUDA moat slowly eroded. 3/4🧵
One of the most underappreciated ways to play the AI semiconductor buildout may be through materials rather than chips themselves.
As the industry races to produce more advanced semiconductors, demand isn’t just rising for GPUs and wafer fab equipment, it’s rising for the critical materials that make modern chips possible. (1/6)🧵
Tungsten is a great example.
It is one of the most critical materials in semiconductor fabrication, prized for its high-temperature stability and resistance to electrical wear. Fabs rely on CVD to fill the deep, high-aspect-ratio vertical vias that link multi-layered chip architectures, while utilizing PVD to deposit the ultra-thin structural barrier layers surrounding them. Because it spans both core deposition categories, tungsten is completely non-negotiable for advanced chip production. (2/6)
What’s interesting is that supply appears increasingly constrained. High-purity tungsten metal powder is the primary raw material used to manufacture WF₆ (tungsten hexafluoride, the gas used in CVD). The raw supply chain is overwhelmingly dominated by China, which controls roughly 80% of global tungsten mining, refining, and powder processing capacity.
China exports YTD are down ~50% YoY, and the data demonstrates the pricing pressure global customers are facing on this critical component. (3/6)
BREAKING NEWS: The Founder/CEO of LeptonAI has left only a year after LeptonAI’s acquisition. This is quite shocking, as Jensen reportedly spent $700M acquiring LeptonAI. What did he see? DGX Lepton flopped and got nowhere near the success Jensen expected. 1/7🧵
Initially, NVIDIA claimed that Lepton’s core software platform would be open-sourced by 2026. That has yet to happen. While we were skeptical, we wanted to believe that NVIDIA would open-source the core Lepton software platform, given that Lepton’s CEO is the co-creator of Caffe, ONNX, and PyTorch. 2/7🧵
One speculation for why Lepton’s CEO left is that Jensen ultimately changed his mind and did not approve open-sourcing Lepton. In acquisitions, standard practice is for vesting to happen over multiple years. 3/7🧵
One of the more uncomfortable observations in our AI Value Capture piece is internal: our token spend at SemiAnalysis now runs at roughly 30% of employee compensation, with employees pulling just under 5 billion tokens per month on average, over 5x more than Meta, and our top contributors clearing 100 billion. We wrote about it openly because every research firm, hedge fund, and law firm we know is heading toward a similar number, just on a delay. (1/4)🧵
The substitution math is the part to internalize. Tasks that used to need a junior analyst for several hours, converting a model to a dashboard, building chart packs from earnings, rebuilding a comp set, now resolve in minutes for a few dollars of tokens. The blended Opus 4.7 cost we observe is about $0.99 per million against $5/$25 sticker, mostly because agentic workloads run 300:1 input-to-output ratios and cache hit rates above 90% pull the effective price down. Thats a real change in the unit economics of professional services, not a 10% efficiency gain. (2/4)
The throughput math has gotten the most pushback in our reader notes, so its worth being precise. On the same B300 running DeepSeek R1, baseline FP8 sits near 1,000 tokens/sec/GPU, adding wideEP plus disagg gets you to roughly 8,000, and layering MTP on top pushes it to about 14,000, a 14x gain from software alone. Factor in hardware too and the most optimized GB300 NVL72 hits about 17x the best H100 config in FP8, 32x in FP4. Once you accept that compression is real, model-lab gross margin expansion stops looking like a temporary pricing oddity and starts looking structural. (3/4)
H100 ornn index spot prices are falling, now at $2.42 per hour, roughly 40% below the May peak. The ecosystem is concerned that this is a sign that compute demand and by extension the appetite for AI is waning. (1/5)🧵
The important signal is that this is likely a spot price index not term pricing. Our neocloud survey for 1-year H100 contract prices have isntead climbed from a trough of roughly $1.70 per hour late last year to about $2.65 per hour today. (2/5)
Spot and on-demand markets are where buyers run POCs, one-off evaluations, burst workloads, and capacity overflow. They can be useful when taken as part of a dataset but are not reflective of where production economics are set. Contract pricing is where sustained workloads show up with the intention of planned, recurring, revenue-bearing inference or training demand. (3/5)