In January, I speculated how "the most impactful outcome from DeepSeek's rise may ultimately be closer collaboration with Huawei and other chip designers".
We now have direct evidence of this collaboration, with potential standardization around UE8M0 the first major tangible result.
While some may dismiss this as technical or esoteric jargon relevant only to AI, computer science, or math enthusiasts ... I will try to explain here in plain language some key market and geopolitical implications of this development.
First I want to acknowledge others who are much closer to DeepSeek and AI for both raising, highlighting and explaining these recent developments, particularly @teortaxesTex @zephyr_z9 and @Compute_King
In this 🧵 I am merely synthesizing the insights and knowledge gained from following their timelines and trying to add value by layering on market and geopolitical insights.
In particular I highly recommend first digesting this post on the technical and strategic implications of UE8M0:
UE8M0 is a custom 8-bit floating-point number format (FP8) proposed by DeepSeek, designed to improve memory efficiency + hardware compatibility with nextgen AI chips.
(How numbers are represented is critical in AI training due to the gazillion calculations that are performed over the course of a training run. Each operation represents some quanta of time and energy, which ultimately translates to cost.)
One of DeepSeek's more notable innovations was relying on FP8 to train its v3 model released last December instead of 16-bit format FP16. They found that the increase in efficiency was worth the trade-off in accuracy (which could be mitigated through other algorithmic optimizations).
You may recall the ridiculous debate over the "real" cost of training DeepSeek v3 (and how many chips they really used to train the model 👇).
Well, relying on FP8 was a key reason why DeepSeek's training costs were so much lower than other models.
Notably, to implement FP8, DeepSeek had to use nVidia chips.
Domestic alternatives like Huawei's latest Ascend chips did not natively support FP8 (aside: which is why claims that reliance on Huawei is what delayed DeepSeek made little sense 👇).
But while nVidia provides native support for two variants of FP8, DeepSeek is now moving to UE8M0 with v3.1, which nVidia does not (yet) support.
By announcing model standardization around UE8M0, this creates a new space for China's domestic AI chip industry to fill, who now find themselves in a novel and unfamiliar position of having first-mover advantage instead of constantly having to play catchup.
If standardization around UE8M0 proves to deliver key technical benefits — and we won't know until we see it widely implemented/deployed — China will have taken a rather significant step to "fork" AI development paradigm away from a hardware-dominated ecosystem.
This standards-setting has historical precedent:
It reminded me of the debate on "polar coding" & "low-density parity check" during the 3GPP-led 5G standards debate in the 2010s.
(note: both methods were eventually incorporated in 5G standards)
Indeed, the 5G wars merely presaged continued future "forking" and bifurcation in global technology standards, as I also speculated on in this 2018 piece.
While the original "DeepSeek moment" had clear implications on (the shift to) AI inference 👇 and downstream use cases, the implication of a forked, Chinese-centered AI development ecosystem now directly impacts the training stage too.
IOW while DeepSeek's v3 release already made it quite clear that nVidia would not be able to maintain a dominant monopoly position in AI inference, its CUDA-driven moat seemed like it might hold (e.g. DeepSeek had to continue using nVidia chips to train) and allow it to continue to maintain near-monopoly global market share in training (including in China) for the foreseeable future.
By potentially shifting the "standards-setting" power from hardware to the software (model layer), this development strikes directly at its moat in training.
P.S. This does not mean nVidia is "toast"!
First, it is not yet clear whether the technical benefits of this new UE8M0 standard are realized.
Second, nVidia is still likely to maintain its dominant market position in Western tech ecosystem.
Third, driving further innovation in AI will only increase demand for nVidia chips. Any declines in market share can be outweighed by increasing TAM.
Last and perhaps most significantly, nVidia got to this point by being a rockstar company with elite technical capabilities. If UE8M0 delivers those technical benefits, nVidia will rapidly move to support it.
From a geopolitics perspective, the Chinese AI ecosystem is now taking an increasing leadership role and driving standards.
While the Western tech ecosystem is likely to remain mostly protected and "safe" from Chinese competition, Chinese and Western AI ecosystems will compete directly in "neutral" battleground states around the world.
This map — now-defunct Biden-era U.S. diffusion rules tiering — provides a decent proxy on levels of this neutrality — at least from the U.S. perspective
(notably, India might have also recently changed colors in recent weeks)
These recent developments are also yet another blow to the original rationale for U.S. export controls — whose main/explict goal was to contain the progress of Chinese AI development.
Export controls on advanced chip fabrication merely forced the Chinese tech ecosystem to pursue optimizations elsewhere in the technology stack.
We saw that with DeepSeek selecting FP8 for v3 and getting "low level" with PTX to unlock untapped horsepower in export-restriction-crippled nVidia H800s; we saw it in Huawei leveraging core expertise in optical networking with CloudMatrix384.
And now with DeepSeek’s close collaboration with China’s AI chip ecosystem, export controls have provided the missing impetus to transform a previously chaotic and fragmented sector into a more coordinated, unified effort.
The last one is analogous to a similar coordinating effect that export controls had on accelerating China's SME sector.
Chinese AI development has not been impeded in any apparent way and it may even be leading now in downstream use.
Export controls have not made AI usage more expensive or materially less capable in China. Pricing is an order-of-magnitude lower:
With such low pricing, not surprisingly commercialization and token usage (as a proxy for downstream use cases) is skyrocketing and may even exceed the U.S. now.
Export controls traded dubious short-run benefits — the fanciful theory that achieving AGI/ASI would create some insurmountable long-term technology/economic advantage ...
... for long-term erosion in some of the most powerful economic and technology moats that American companies like nVidia had built up through decades of industry leadership.
IMO a massive self-own, and possibly a consequence of technology-related decisions with massive LT strategic implications being made in a society dominated lawyers instead of engineers 👇.
(h/t to @danwwang and his upcoming new book "Breakneck")
Productivity is what ultimately drives per-capita economic growth and increases in living standards over the long run. This concept is one of the pillars of developmental economics.
I’ve come to realize one of the the fundamental issues with Pettis/Setser economic framing is misplaced reliance on accounting identities with little to no consideration of productivity effects.
To wit: nowhere in this thread is there any mention or consideration of how this sectoral shift impacts productivity.
In the short to medium run, there can certainly be supply-demand disequilibrium where “weak demand” is an issue.
e.g. in this 🧵 from a year ago I tried to quantify the headwinds from reverse wealth effect impact of the policy-driven pivot away from real estate to manufacturing since 2020 and how they could offset wage growth driven by underlying productivity growth enabled by sectoral shift.
But given enough time, markets adjust to find new equilibrium points. The more dynamic the economy, the quicker the adjustment.
Growth in per-capita income (and wealth) growth in the long run must be driven by primarily by increases in productivity. Since demand is derived from income 👇, this means productivity growth also drives demand.
After three years of housing price declines — in line with forecasts like 👇 from two years ago — real estate has stabilized and from a GDP perspective, no longer a significant tailwind.
> “The state cannot allocate capital more efficiently than the market.”
An oft-repeated axiom chanted like a religious mantra and accepted by many as a universal truth.
But one that can be easily debunked with a straightforward contra-example from one of the most capital-intensive industries of them all: passenger rail.
China Railway (SOE) vs. Brightline (private)
CR HSR:
▪️ 48,000 km of greenfield track, predominantly elevated on viaducts
▪️ Serves 3.6B passengers annually
▪️ ¥550B of revenue on ¥5T of capital investment (9 years revenue payback)
▪️ 42 fatalities over 17+ years and 23B passenger rides
Brightline Florida:
▪️ 376 km of refurbished at-grade track
▪️ Serves 2.8M passengers per year
▪️ $187M revenue on at least $5.5B capital investment (29 years revenue payback)
▪️ Caused 182 fatalities in two-plus years of operation (hint: maybe you shouldn’t run fast trains over at-grade crossings).
Did the private company really do a better job allocating capital here? (rhetorical)
So no, I don’t think “the private sector is always better at allocating capital than the state sector” should be simply accepted as a universal truth, unchallenged.
It depends on the industry and the type of capital formation and the level of state/institutional capacity.
The more interesting, less-ideological exercise is to figure out the optimal ratio of state vs. private involvement on a sector-by-sector basis. This one requires actual nuance and complex thought.
Re-invigoration of biking culture in China and the relentless expansion of dedicated biking lanes today can really be traced to the invention and proliferation of dockless bike-share systems starting around a decade ago.
~830B barrels of proven reserves in the Middle East has effective energy equivalent of around 11,300 GW of solar PV that produce over a 25-year useful life.
At 14 km2/GW, this would take up desert space of ~158,200 km2, which is less than a quarter of China’s portion of the Gobi Desert.
Moreover, regular maintenance and replacement means this infrastructure would produce energy in perpetuity, while the Middle East oil fields run out or become more costly/difficult to extract (even with improved extraction technology).
China is currently deploying solar PV at a run rate of 300+ GW per annum, which means at just current run rates it can deploy this volume of solar PV in 37 years.
Remember it also took multiple decades to develop the vast oil fields in the Middle East starting in the 30s and 40s.