Willow Cove has a massive amount of cache and should do over 20% better per clock than Renoir.
Under 15-30W I'm also sceptical how well the 8 cores on Renoir scale but the results are out there, I just didn't had the time to look.
2/x
For me the device will be mostly for browsing and some casual games, I rather take the better ST performance.
3/x
Since both share the same system memory, the less needy the CPU is, the more effective bandwidth is available for the GPU.
5/x
That's just a killer feature for every grandma and casual pleb out there.
Seriously, I'm happy that Intel increases the base for it. Skylake-X was just for a few high-end users, Cannon Lake was extreme low volume
6/x
Tiger Lake should have a larger volume.
On AMD's side maybe not even Zen3 will support AVX512, but fingers crossed.
7/x
The theoretical throughput from the rendering Front- and Backend wasn't increased by AMD since the first APU in 2011 with Llano.
1 Triangle per clock.
I think 16 pixels per clock...
8/x
But the rendering backend definitely can only output 8 Pixel/clock.
9/x
Intel throws more resources at the problem.
Xe LP in Tiger Lake can do 2 Triangles/clock and the render backend can output 24 Pixels/clock, that's a level above Renoir's theoretical capabilities.
10/x
Now the question becomes what can the Raster-Engine deliver?
I guess at least 24 Pixel/clock otherwise Intel wouldn't have built such a wide backend.
11/x
Intel didn't unveiled all details, so there is further guesswork.
Since Intel doubled the amount of cores per sub slice I think that the Local Data Share (LDS) capacity was also doubled to 128KiB, Intel calls it Shared Local Memory (SLM)
12/x
64KiB for 128 "cores" vs. 16KiB for 64 "cores".
Again more data local, higher practical performance possible.
13/x
Since RDNA1 AMD can also co-execute main ops and SFU ops.
I suspect SFU ops are used rarely in applications but it's a small bonus point for Xe.
Another bonus point are dot product instructions...
15/x
There was also a demonstration vs. 4800U where a photo filter was accelerated by such instructions.
16/x
One cool feature from Intel's Gen graphics was a variable SIMD execution model.
Intel could use SIMD8, SIMD16 or SIMD32.
Depending on the code a narrow or wider workset is better.
Intel could adapt.
18/x
Since RDNA1 AMD supports SIMD64 and SIMD32.
Nvidia uses a logical SIMD size of 32 elements, they called it a warp.
But what does Xe support and use?
19/x
"Per core performance" should be good on Xe LP and Intel has 768 "cores" vs. 512 "cores" on Renoir.
20/x
4700G desktop Renoir at 2.1 GHZ = 2.150 FP32 TFLOPs.
Obviously you need something to feed the engine otherwise iGPU perf won't scale.
21/x
Xe LP has effectively per "core" twice as much L1$/Tex$.
And a large bonus is the L2$ on the GPU side.
Xe has 3.8MB L2$, Renoir just 1MB.
22/x
Intel had spent a lot of area because DDR4-3200 is the maximum speed or LPDDR4X with up to 4266.
With large caches Intel gets perf scaling
23/x
Currently 40% better GPU perf vs. Renoir is what I expect in many games.
I would say now the game is "fair".
7nm TSMC vs. 10nm SF.
Haswell or Broadwell already won vs. Kaveri or Carrizo but Intel had a massive process advantage + eDRAM.
25/x
Cezanne will again use 512 Cores with GCN5, Van Gogh RDNA2 with Zen2 cores.
It's a weird mix where Intel will win on one or the other front.
26/x
Tiger Lake has AV1 decoding, Thunderbolt4, USB4 and PCIe4 for a SSD vs. Renoir which has no AV1 decoding, no Thunderbolt4, no USB4 and only PCIe3 to external devices.
For me Tiger Lake is really the superior platform.
27/x
----
And yes Willow Cove is a big boi but higher performance per clock has to come from somewhere.
29/x
32.17mm² for Renoir vs. 43.81mm² for Tiger Lake (+36%).
One glaring aspect is the PHY area for memory.
It's a massive portion on Renoir vs. TGL.
30/30