Dr. Moritz Lehmann Profile picture
I have discontinued this Twitter account and moved to a much better place called #Mastodon: https://t.co/eE6frW22tk

Jul 27, 2022, 8 tweets

I have benchmarked the new @AMDInstinct MI250 #GPU at @fzj_jsc, and it is disappointing but also impressive. Let me explain.
🧵1/6

cc @HPC_Guru @hpcprogrammer @IanCutress @ProfMatsuoka @sunitachandra29 @aschilling @wkmyrhang @VideoCardz @AMDGPU_
#HPC #Top500 #MI200 #Exascale

The #MI250 is misleadingly marketed as "one chiplet GPU" with 13312C, 90TFLOPs & 128GB @ 3.2TB/s.

But it is not. The 2 GCDs are 2 separate GPUs with 64GB each, like a K80 dual-GPU but in a socket. One #GPU can't directly access the other's memory.
🧵2/6

amd.com/en/products/se…

To use both GCDs, the software needs to be multi-GPU capable. For many algorithms this is very difficult and for some it is entirely infeasible. The desire for large unified memory is huge.
The #MI250 promises exactly that with "128GB", but delivers only half.
🧵3/6

Although faster on paper, a single #MI200 GCD is much inferior to the #A100 40GB for bandwidth-bound applications (e.g. lattice Boltzmann / #CFD), because it's efficiency is about as low as old Nvidia Kepler. It is only moderate performance increase over #MI100/#RadeonVII.
🧵4/6

More memory (32GB on #MI100 -> 64GB on #MI200) is a very good step in the right direction though. Also the node itself is solid, essentially 8 fast #GPU​s in 4 sockets with fast interconnect.
I'm looking forward to do some large-scale simulations in the coming weeks.
🧵5/6

Huge thanks to @fzj_jsc @AndiH @vitonildo @mj_klemm @AtosBigData for providing me access to that hardware so early!
🧵6/6

I have published the #FluidX3D lattice Boltzmann and #OpenCL memory bandwidth benchmarks in this paper in @PhysRevE:

Now for the fun stuff: a first large-scale simulation on @AMDInstinct #MI250. Even with only 64GB accessible on a GCD, it's one beast of a #GPU! šŸ––šŸ¤ÆšŸ–„ļøšŸ”„

While the one GCD of #MI250 simulates the X-wing, the second GCD can be used for the dark side of the force and simulate TIE fighter aerodynamics. šŸ––šŸ˜ˆ
Both of these large-scale simulations can run at the same time with single-socket @AMDInstinct hardware.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling