Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Asuka🎀Redpanda

@VoidAsuka

May 14 • 8 tweets • 2 min read • Read on X

visited my uncle in shenzhen. he’s a gpu smuggler.
he handed me this modified 5090 turbo and said:
"future of AI inference. 32GB each, 8 cards, 256GB total VRAM, under $30k. huaqiangbei doesn’t wait for nvidia."
huaqiangbei is really wild.💀

here’s what he told me: HGX servers are designed for training huge AI models—power-hungry, liquid-cooled, and crazy expensive. But for inference (running those models), it’s a different game: → You don’t need as much compute → You just need enough VRAM to fit the model

That’s why many AI infra builders use traditional x86 + PCIe servers:
• cheaper
• flexible
• easy to scale horizontally
But there’s a problem: consumer GPUs like 4090/5090 are big and awkward—2.5 to 4 slots wide.

Enter the blower-style card: double-slot, front-to-back airflow, server-friendly.
Each generation has one. But NVIDIA hates them.
Why? Because a rack full of 4090 blowers replaces an H100 server at 1/10 the cost.

NVIDIA cripples gaming cards on purpose:
🚫 No NVLink after 3090
🚫 Max 24GB VRAM
🚫 No official blower 4090/5090
So if you want dense GPU inference, you either go broke... or go underground.

In Huaqiangbei, engineers reverse-engineered the blower design.
Now they mass-produce 4090 blowers, unofficial and off-NVIDIA’s radar.
They're shipping globally, and account for 90%+ of all 4090 blowers in the wild.

This has accidentally made the 4090 the go-to choice for inference servers because it’s crazy cost-effective. Sure, it doesn’t have NVLink, but with some software wizardry, you can still pool the VRAM—24GB times 8 cards gives you 192GB total—to run big models under 200 billion parameters, or even FP4-quantized high-parameter models.

Huaqiangbei takes it even further. They’ve figured out a way to mod 4090s to have 48GB of VRAM. That means you can build an inference server with 384GB of VRAM for under 50k. And right now all those huaqiangbei GPU bros are busy producing blower-style 5090 cards, which my uncle believes will become the next big thing for affordable, high-performance inference servers.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Share this page!

Enter URL or ID to Unroll

Asuka🎀Redpanda

Try unrolling a thread yourself!

More from @VoidAsuka

Asuka🎀Redpanda

Asuka🎀Redpanda

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!