Locuza Profile picture
7 Feb, 21 tweets, 7 min read
This was a nightmare project to work on, with a frankenstein audio recording mash up but I can't muster the strength and necessary time to re-record+cut the thing again.

Topic and details are quite interesting though.
Summary pictures follow this thread.

1) Xbox Series X/S die shots scaled to relative true size.
(It's not super accurate though)
2) PS4&Xbox One die shots scaled to relative true size
3.) ^ with annotations

If people are curious about the PS4/Xbox One gen, I could make an extra video for them. ImageImageImage
1) PS4 Pro and Xbox One X die shots scaled to relative true size.
2) ^ with annotations

Again, if someone wants a deep dive on those chips, I could make a video on it. ImageImage
Thanks to stblr's firmware extraction we got a great insight into the fine mixture of many different hardware IP blocks used for one GPU or APU chip.
This enables us to make a relatively precise comparison between RDNA1, 2 and the Xbox Series chips and verify the claim, if ...
..the Xbox Series X/S are really based on RDNA2 or what differences exist.
There are multiple differences and aspects which differ and are quite interesting to talk about.
For example the codename Navi21 Lite is pointless, in the sense that the XSX has no direct relationship... Image
..with Navi21.
It uses a very unique hw configuration and different hw IPs in many places.
The codename also could have been Navi22 Lite, Navi X Lite or Ketchup 27, since there is little to no meaning behind it.
---
Many IP blocks on the XSX are older than on Navi2X, however..
...some are also newer than on Navi2X and some even older than on Navi1X GPUs.
Probably many hw IP blocks have their own development timeline and depending on the dynamics during the whole chip project, some aspects are settled down sooner than on other chips, leading to such.. Image
..a funny mixture of older and newer IP blocks per chip project.
Though it also should be noted that not every IP version is directly comparable.
For example the UMC version for HBM memory on Navi12 is 6.5 in comparison to 8.X on GDDR6 UMCs for Navi1/2X GPUs...
The NBIF block is also such a candidate.
It starts with 2.x or 3.x on Navi1X and Navi2X but has the version number 7.4.2 on the Xbox Series X.
AMD's upcoming Van Gogh and Rembrandt APUs are also starting with 7.X.
The XSX is using the Graphics Core version 10.2.
The PS5 10.0.
Some people probably wondered, why RDNA1 GPUs from AMD start with 10.1 and not 10.0 and why do we have the jump to 10.3 on RDNA2 GPUs?
Custom chips are using 10.0 and 10.2 GC IP.
So what's the difference there?
That's not so easy to answer, since a couple of differences between RDNA1 and RDNA2 GPUs can't be checked on the XSX.
However there is one piece which AMDs open source drivers mention.
The pa_sc_tile_steering_override bit (PA = Primitive Assembly, SC = Scan Converter) is not... Image
..programmable under GFX10.3 but on all GFX 10.X versions previously.
GFX10.3 also removed the DFSM hw for full primitive binning.
GFX10.1 still has it, potentially GFX10.2 too but it's unlikely that Microsoft will expose it, even if it is.
Other things which can't be confirmed/denied are larger SDMA copy sizes on GFX10.2, respectively on the SDMA version side.
Also if MSAA image load, global thread ID for loads&stores and the global atomic clamped substraction is included on GFX10.2, which came with GFX10.3.
Further and more interesting would be to know, if the allocation granularity for the register file and Local Data Share is also twice as large as on RDNA2 or the same as on RDNA1 GPUs?
I rather think that the latter is the case, to get a behavior more closely related to GCN. Image
Now we can talk about some differences we know about.
1.) The XSX uses 20 Wave Controllers per SIMD engine as RDNA1 GPUs.
Every RDNA2 chip from AMD only has 16.
I don't think this has any significant performance implications and it was an economic cut for RDNA2 however for the... Image
..Xbox Series it was probably desirable to support the same thread group size as on previous GCN versions for smoother backwards compatibility.

2.) The XSX uses the same rendering frontend setup as RDNA1 GPUs.
2 Primitive Units and 2 Scan Converter (each with 2 Packers) per SE. Image
This doesn't necessarly mean that the rendering frontend is exactly the same as on RDNA1 GPUs, there are many sub-blocks which could be different however one can wonder why such a configuration was chosen?
New frontend design too late for the project or not desirable?
I tend to the later because RDNA2 GPUs only have one Primitive Unit per SE instead of previously two.
Together with the new Rasterizer which is now working with 32 Pixels it could have been challenging in some cases to have good performance.
I'm very curious about N22 perf here.
3.) Also as on RDNA1 GPUs, the XSX uses two sub-arrays per Shader Array.
We have 4 WGPs on one sub-array (WGP0) and 3 on another (WGP1).
Currently not a single upcoming RDNA2 GPU/APU will use two sub-arrays. (Navi24 could be one?)
They all have their WGPs on only one sub-array. Image
That's basically it in terms of differences.
Since the XSX does support ray tracing acceleration, Mesh Shaders, Sampler Feedback, Variable Rate Shading and is also showing great efficiency numbers, indicating that a lot of the physical design work for RDNA2 is used, I would ... ImageImage
..give the marketing guys a greenlight here.
It's fair to say that it's based on RDNA2 although some (larger) differences exist.
They just don't matter that much for the endresults and the most important features are included.
Nonetheless it's fasciniting to see how it differs.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Locuza

Locuza Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Locuza_

2 Sep 20
Ahh damn it, I again didn't managed a super fast rambling video about Renoir vs. Tiger Lake.
So it's time for a picture thread with rambling, less than 30 minutes to go.
1/x
I really like the CPU engine from Intel.
Willow Cove has a massive amount of cache and should do over 20% better per clock than Renoir.
Under 15-30W I'm also sceptical how well the 8 cores on Renoir scale but the results are out there, I just didn't had the time to look.
2/x
IIRC some 3DMark results showed ~50% higher CPU scores on Renoir vs. Tiger Lake models but that would be totally okay.
For me the device will be mostly for browsing and some casual games, I rather take the better ST performance.
3/x
Read 30 tweets
1 Sep 20
I think even for a speed rambling video the time is too short with Ampere's presentation coming in less than three hours which is why a picture thread with my thoughs will follow.
Ampere vs. Big Navi.🔥
1/x
The specs with 5248 "CUDA cores" are already out there for the GTX3090.
@_rogame found the configuration of Navi21 from driver files, confirming that 40WGPs/80CUs/5120 "cores" will be used.
Bringing both close together in terms of FP32 throughput
2/x
If 84 SMs are the maximum configuration of the GA102 chip then 6 GPCs are fitting, with 14SM each.
With 6 GPCs we have 6 Rasterizer/Scan Converter = 96 Pixels per clock.
ROPs are tied to Memory Controllers, with 384-Bit we have 96 ROPs = 96 Pixels/clock
3/x
Read 20 tweets
24 Jul 20
Zen 2 (+1) part 6 analysis for laymen is done:


Comparing the sizes of CCX, L3$, L2$, Core and starting with the L3$ item/device inventory.
Images with crucial information follow this tweet.
1) Technical details and differences between Zen1 and Zen2
2) Zen1 CCX size of 44.11mm²
3) Zen2 CCX size of 31.39mm²
4) Zen1 L3 Cache size of 16.32mm²
5) Zen2 L3 Cache size of 16.82mm²:
6) Zen1 and Zen2 L3 Caches side by side, scaled to relative true size
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!