Locuza Profile picture
19 Jul, 6 tweets, 2 min read
A discussion and curiosity is resolved now.
Van Gogh, which is used by Valve's Steam Deck, has 4 UMCs.
I expected 4x 16-Bit (a memory channel under LPDDR5 is actually 16-Bit wide).
The official spec claimed 5.5 Gbps (dual-channel), which didn't made sense to me.
It got corrected
Valve claims now 4x 32-Bit (128-Bit) which fits to 4 UMCs.
It also means that as on Renoir/Cezanne, AMD is using a controller design with a 32-Bit granularity instead of 16-Bit channels.

Even 64-Bit LPDDR5 wouldn't have been bad for the Steam Deck specs but now bw looks great.
In comparison to current gen consoles, only from the GPU perspective, you get more GB/s per TeraFLOP.
A small comparison:
XSX: 46.09 GB/s per GPU TFLOP
XSS: 55.91 GB/s per GPU TFLOP
PS5: 43.58 GB/s per GPU TFLOP
Steam Deck: 53.72-85.94 GB/s per GPU TF
Extra comments for expectations are probably necessary.
The GB/s per GPU FLOP comparison obviously doesn't consider other clients sharing the memory bus, the amount of memory channels and banks per system, the power&memory bandwidth split, the software stack differences, etc.
While 1280x800 is about half the amount of pixels vs. 1920x1080 (Xbox Series S) and the GB/s per GPU TFLOP is greater than on stationary consoles, game workloads outside of resolution either don't scale down or to a far lesser degree.
Geometry, gameplay logic, audio systems, etc
The Steam Deck has less than half of the CPU power, the GPU also has strongly reduced fixed function geometry capabilities.
Low level APIs and optimizations, as on consoles, are not available.
Don't expect Xbox Series/PS5 experience, or even better, in 1280x800.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Locuza

Locuza Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Locuza_

7 Feb
This was a nightmare project to work on, with a frankenstein audio recording mash up but I can't muster the strength and necessary time to re-record+cut the thing again.

Topic and details are quite interesting though.
Summary pictures follow this thread.

1) Xbox Series X/S die shots scaled to relative true size.
(It's not super accurate though)
2) PS4&Xbox One die shots scaled to relative true size
3.) ^ with annotations

If people are curious about the PS4/Xbox One gen, I could make an extra video for them. ImageImageImage
1) PS4 Pro and Xbox One X die shots scaled to relative true size.
2) ^ with annotations

Again, if someone wants a deep dive on those chips, I could make a video on it. ImageImage
Read 21 tweets
2 Sep 20
Ahh damn it, I again didn't managed a super fast rambling video about Renoir vs. Tiger Lake.
So it's time for a picture thread with rambling, less than 30 minutes to go.
1/x
I really like the CPU engine from Intel.
Willow Cove has a massive amount of cache and should do over 20% better per clock than Renoir.
Under 15-30W I'm also sceptical how well the 8 cores on Renoir scale but the results are out there, I just didn't had the time to look.
2/x
IIRC some 3DMark results showed ~50% higher CPU scores on Renoir vs. Tiger Lake models but that would be totally okay.
For me the device will be mostly for browsing and some casual games, I rather take the better ST performance.
3/x
Read 30 tweets
1 Sep 20
I think even for a speed rambling video the time is too short with Ampere's presentation coming in less than three hours which is why a picture thread with my thoughs will follow.
Ampere vs. Big Navi.🔥
1/x
The specs with 5248 "CUDA cores" are already out there for the GTX3090.
@_rogame found the configuration of Navi21 from driver files, confirming that 40WGPs/80CUs/5120 "cores" will be used.
Bringing both close together in terms of FP32 throughput
2/x
If 84 SMs are the maximum configuration of the GA102 chip then 6 GPCs are fitting, with 14SM each.
With 6 GPCs we have 6 Rasterizer/Scan Converter = 96 Pixels per clock.
ROPs are tied to Memory Controllers, with 384-Bit we have 96 ROPs = 96 Pixels/clock
3/x
Read 20 tweets
24 Jul 20
Zen 2 (+1) part 6 analysis for laymen is done:


Comparing the sizes of CCX, L3$, L2$, Core and starting with the L3$ item/device inventory.
Images with crucial information follow this tweet.
1) Technical details and differences between Zen1 and Zen2
2) Zen1 CCX size of 44.11mm²
3) Zen2 CCX size of 31.39mm²
4) Zen1 L3 Cache size of 16.32mm²
5) Zen2 L3 Cache size of 16.82mm²:
6) Zen1 and Zen2 L3 Caches side by side, scaled to relative true size
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(