Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Dr. Moritz Lehmann

@ProjectPhysX

Jul 27, 2022 • 8 tweets • 12 min read • Read on X

@AMDInstinct

I have benchmarked the new @AMDInstinct MI250 #GPU at @fzj_jsc, and it is disappointing but also impressive. Let me explain.
🧵1/6

cc @HPC_Guru @hpcprogrammer @IanCutress @ProfMatsuoka @sunitachandra29 @aschilling @wkmyrhang @VideoCardz @AMDGPU_
#HPC #Top500 #MI200 #Exascale

https://twitter.com/fzj_jsc/status/1501945315187834882

The #MI250 is misleadingly marketed as "one chiplet GPU" with 13312C, 90TFLOPs & 128GB @ 3.2TB/s.

But it is not. The 2 GCDs are 2 separate GPUs with 64GB each, like a K80 dual-GPU but in a socket. One #GPU can't directly access the other's memory.
🧵2/6

amd.com/en/products/se…

To use both GCDs, the software needs to be multi-GPU capable. For many algorithms this is very difficult and for some it is entirely infeasible. The desire for large unified memory is huge.
The #MI250 promises exactly that with "128GB", but delivers only half.
🧵3/6

Although faster on paper, a single #MI200 GCD is much inferior to the #A100 40GB for bandwidth-bound applications (e.g. lattice Boltzmann / #CFD), because it's efficiency is about as low as old Nvidia Kepler. It is only moderate performance increase over #MI100/#RadeonVII.
🧵4/6

More memory (32GB on #MI100 -> 64GB on #MI200) is a very good step in the right direction though. Also the node itself is solid, essentially 8 fast #GPUs in 4 sockets with fast interconnect.
I'm looking forward to do some large-scale simulations in the coming weeks.
🧵5/6

@fzj_jsc

Huge thanks to @fzj_jsc @AndiH @vitonildo @mj_klemm @AtosBigData for providing me access to that hardware so early!
🧵6/6

I have published the #FluidX3D lattice Boltzmann and #OpenCL memory bandwidth benchmarks in this paper in @PhysRevE:

https://twitter.com/ProjectPhysX/status/1552225695044190212

@AMDInstinct

Now for the fun stuff: a first large-scale simulation on @AMDInstinct #MI250. Even with only 64GB accessible on a GCD, it's one beast of a #GPU! 🖖🤯🖥️🔥

https://twitter.com/ProjectPhysX/status/1552659086822588416

@AMDInstinct

While the one GCD of #MI250 simulates the X-wing, the second GCD can be used for the dark side of the force and simulate TIE fighter aerodynamics. 🖖😈
Both of these large-scale simulations can run at the same time with single-socket @AMDInstinct hardware.

https://twitter.com/ProjectPhysX/status/1555174589365370882

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ProjectPhysX

Dr. Moritz Lehmann

@ProjectPhysX

Oct 7, 2022

@officialBinotto

I did @officialBinotto's @ScuderiaFerrari SF71H in @FluidX3D #CFD on a supercomputer.
- 1s in real life @ 100km/h
- 20s 4K60 video (3x)
- 14h compute on 8x @AMDInstinct #MI200 64GB #GPU
- 144TB data visualized
What I found is absolutely wild. A #SimulationFriday #F1 thread: 🧵1/5

@FluidX3D

The @FluidX3D simulation was done at 10 billion voxel grid resolution (2152×4304×1076), over 217k time steps (1 second), at Re=3.75M (100km/h).
The fins on the front spoiler create a turbulent boundary layer and kick up it up onto the front wheels to reduce drag. 🧵2/5

@FluidX3D

The streamlined chassis guides airflow under the spoiler to create down force. The halo - one of the best additions to the sport in terms of safety - is rather aerodynamic.
Each frame of the video is 120GB, 144TB for 1201 frames. @FluidX3D renders the data directly in VRAM. 🧵3/5