visited my uncle in shenzhen. he’s a gpu smuggler.
he handed me this modified 5090 turbo and said:
"future of AI inference. 32GB each, 8 cards, 256GB total VRAM, under $30k. huaqiangbei doesn’t wait for nvidia."
huaqiangbei is really wild.💀
here’s what he told me: HGX servers are designed for training huge AI models—power-hungry, liquid-cooled, and crazy expensive. But for inference (running those models), it’s a different game: → You don’t need as much compute → You just need enough VRAM to fit the model
That’s why many AI infra builders use traditional x86 + PCIe servers:
• cheaper
• flexible
• easy to scale horizontally
But there’s a problem: consumer GPUs like 4090/5090 are big and awkward—2.5 to 4 slots wide.