The trend for graphics architecture is reduced ALUs per SM/CU, as it is difficult to keep utilization high. Shared registers & cache subsystem, and instruction fetch, decode, and warp schedulers all have to be expanded to utilize the extra ALUs.
At which point it's more advantageous to just expand number of SM/CUs. Kepler -> Maxwell -> Turing etc went from higher ALUs per SM to 64. Its only Ampere where FP32 is doubled, but its really a datacenter compute focused design.
The easy part of 3D rendering is splitting up the scene into small chunks, can be 8, 16, 32, 64 ^2 pixel groups, where the CU/SM handles them in parallel. Its not advantageous to have very wide CU/SM (utilization), which are more useful for some compute workloads.
Though there are increasingly more compute shaders that can scale better with wider CUs, especially with Async Compute, you will still need the underlying subsystems capable of supporting that many ALUs concurrently, and that eats die area & power.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
It's basically this: game engines aimed at high fidelity visuals test & optimize on NV if it's PC exclusive first, then AMD. Cross-platform high fidelity engines designed for AMD first, then NV. Intel GPU testing is rarely ever given the time it needs. You can understand why.
Why test & optimize for Intel GPU architecture & driver stack when almost nobody is going to be AAA gaming on an iGPU? It cannot justify the time & manpower investment. Even some of the PC exclusive engines are poorly optimized for AMD due to low marketshare!
This is going to be a major problem for @IntelGraphics@Rajaontheedge as they attempt to get into the dGPU market. As great as their engineers are, and I have no doubts they will deliver excellent architecture & hw, the lack of game engine support is what's going to hurt them.