Dr. Moritz Lehmann Profile picture
Jul 27, 2022 ā€¢ 8 tweets ā€¢ 12 min read ā€¢ Read on X
The #MI250 is misleadingly marketed as "one chiplet GPU" with 13312C, 90TFLOPs & 128GB @ 3.2TB/s.

But it is not. The 2 GCDs are 2 separate GPUs with 64GB each, like a K80 dual-GPU but in a socket. One #GPU can't directly access the other's memory.
šŸ§µ2/6

amd.com/en/products/seā€¦
To use both GCDs, the software needs to be multi-GPU capable. For many algorithms this is very difficult and for some it is entirely infeasible. The desire for large unified memory is huge.
The #MI250 promises exactly that with "128GB", but delivers only half.
šŸ§µ3/6
Although faster on paper, a single #MI200 GCD is much inferior to the #A100 40GB for bandwidth-bound applications (e.g. lattice Boltzmann / #CFD), because it's efficiency is about as low as old Nvidia Kepler. It is only moderate performance increase over #MI100/#RadeonVII.
šŸ§µ4/6 ImageImageImageImage
More memory (32GB on #MI100 -> 64GB on #MI200) is a very good step in the right direction though. Also the node itself is solid, essentially 8 fast #GPUā€‹s in 4 sockets with fast interconnect.
I'm looking forward to do some large-scale simulations in the coming weeks.
šŸ§µ5/6
Huge thanks to @fzj_jsc @AndiH @vitonildo @mj_klemm @AtosBigData for providing me access to that hardware so early!
šŸ§µ6/6

I have published the #FluidX3D lattice Boltzmann and #OpenCL memory bandwidth benchmarks in this paper in @PhysRevE:
Now for the fun stuff: a first large-scale simulation on @AMDInstinct #MI250. Even with only 64GB accessible on a GCD, it's one beast of a #GPU! šŸ––šŸ¤ÆšŸ–„ļøšŸ”„
While the one GCD of #MI250 simulates the X-wing, the second GCD can be used for the dark side of the force and simulate TIE fighter aerodynamics. šŸ––šŸ˜ˆ
Both of these large-scale simulations can run at the same time with single-socket @AMDInstinct hardware.

ā€¢ ā€¢ ā€¢

Missing some Tweet in this thread? You can try to force a refresh
怀

Keep Current with Dr. Moritz Lehmann

Dr. Moritz Lehmann Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ProjectPhysX

Oct 7, 2022
I did @officialBinotto's @ScuderiaFerrari SF71H in @FluidX3D #CFD on a supercomputer.
- 1s in real life @ 100km/h
- 20s 4K60 video (3x)
- 14h compute on 8x @AMDInstinct #MI200 64GB #GPU
- 144TB data visualized
What I found is absolutely wild. A #SimulationFriday #F1 thread: šŸ§µ1/5
The @FluidX3D simulation was done at 10 billion voxel grid resolution (2152Ɨ4304Ɨ1076), over 217k time steps (1 second), at Re=3.75M (100km/h).
The fins on the front spoiler create a turbulent boundary layer and kick up it up onto the front wheels to reduce drag. šŸ§µ2/5
The streamlined chassis guides airflow under the spoiler to create down force. The halo - one of the best additions to the sport in terms of safety - is rather aerodynamic.
Each frame of the video is 120GB, 144TB for 1201 frames. @FluidX3D renders the data directly in VRAM. šŸ§µ3/5
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(