Post

Andy Somerfield

Oct 23, 2021 • 32 tweets • 9 min read • Read on X

@affinitybyserif

So, with the upcoming delivery of new #M1Pro and #M1Max hardware to customers next week, I thought I’d spend a couple of days talking about how @affinitybyserif Photo uses GPUs, how our benchmark works and what we should reasonably expect from this new hardware :)

We’ll look at the history of GPU support in Photo, right back to 2009 when we were designing the architecture for the app, working our way right through to now.

I’ll be posting benchmark results for various hardware throughout - so it’s best to understand what the benchmark in Photo means. It does *not* indicate how “fast” a GPU is - there is no single measure of GPU performance.

It certainly does, however, indicate how fast a GPU can make Photo run. Some GPUs will “win” in our benchmark and “lose” in other benchmarks. Remember - real world results in the applications you are interested in are all that matters.

In Photo, an ideal GPU would do three different things well: 1.) High compute performance 2.) Fast on-chip bandwidth 3.) Fast transfer on and off the GPU.

Way back in 2009, no GPU did all three things well - but we thought that eventually the industry would get there, so we took a risk and designed the entire architecture based on that assumption. Things didn’t go entirely to plan..

We shipped Photo in 2015 - six years after the design phase - without GPU compute support :(

A GPU which did all the things we needed simply didn’t exist. We wondered if we had backed the wrong horse. Happily, a short while later it did exist - but it was in an iPad 😬!

We always intended to release Photo on iPad at some point but early results when experimenting with the A9X silicon got us quite excited. A couple of years later we ended up releasing Photo iPad in a live-stage demo at the WWDC keynote in 2017 😬.

The demo went perfectly - which we were absolutely confident was exactly what would happen with a pre-release build running on hardware nobody in engineering had ever even seen 😬🔥.

Once the dust settled though, we found ourselves in an interesting position: the iPad version of our app ran much faster than any of the desktop versions, because it used the GPU.

Here is our first benchmark result - from the iPad Pro which launched Affinity Photo for iPad - an A10X 10.5” unit from 2017:

And here is our managing director Ash - launching Photo iPad at WWDC 2017. He wasn’t nervous at all - I promise 😉

A-series silicon represents, in my opinion, an inflection point for Apple - the realisation that they could make the “whole widget” - that they could do a better job internally than they could do by buying in from 3rd party for CPU/GPU parts.

iPad performance has gone from strength to strength since - here’s the current state of play on the latest M1 based iPad Pro 12.9” in 2021:

Progress indeed.. but winding back, we still had nothing to work with on the desktop, back in 2017. Apple moving to their own SoC for Mac wasn’t even a rumour at that point, so we needed to do something different - we needed to, reluctantly, change direction..

Intel did to come to the party with their Skylake integrated graphics parts on the desktop. Skylake implemented UMA (which gave us “part 3” of the three requirements above), but had fairly modest compute capability and on-chip bandwidth.

I don’t have one of the Intel machines from back in 2016, but here is a result from one of the most powerful Intel integrated GPUs Apple ever shipped - an Iris Pro 655 from a 13” MBP:

The app was faster with the Intel GPU enabled in 2016, but didn’t really meet our expectations so we shipped 1.6 with the feature - but switched it off by default. We finally decided we needed to make discrete GPUs work in Photo 😬

To do this we would have to change a big part of our original design. This took ages and broke lots of things - anyone who was brave enough to participate in the Photo 1.7 beta will attest to this :) Finally though, we got there and started to see some good numbers.

Discrete GPUs have great compute performance and great on-chip bandwidth - the only thing they lack is “part 3” - fast transfer on and off the GPU. We did our best to hide that latency in Photo 1.7.

Here’s a result from a 16” MBP from a couple of years ago with a 5500M. It also has an Intel integrated GPU so we use them both at the same time (Multi GPU result).

Maybe this sort of performance would be a reasonable minimum expectation for the new M1 Pro / M1 Max silicon? Or maybe we should hope for a bit more because these parts are quite old now?

The lack of UMA didn’t really seem to matter - the benchmark scores on modern discrete GPUs are very high. Photo also seemed to scale well with more powerful discrete GPUs..

Just for completeness, here is the fastest thing we have ever had on the bench - it’s a 300watt, $6000 W6900X in a 12-core Mac Pro (with wheels 😎). I don't think we can expect this sort of performance though, can we 😂?

Maybe everything was fine and our work in 1.7 to hide the latency introduced by discrete GPUs was enough 🤷‍♂️. The only thing which could disprove that theory would be If someone started making integrated GPUs which had similar compute and on-chip bandwidth to these discrete GPUs.

Maybe then we could see what difference the UMA requirement in our original design would actually make..

#M1Pro and #M1Max certainly sound like they have UMA GPUs with similar compute performance and on-chip bandwidth to high end discrete GPUs right? Let’s see what difference that makes then.. let’s see what this “ideal” GPU we designed our apps for way back in 2009 actually scores.

@affinitybyserif

The #M1Max is the fastest GPU we have ever measured in the @affinitybyserif Photo benchmark. It outperforms the W6900X - a $6000, 300W desktop part - because it has immense compute performance, immense on-chip bandwidth and immediate transfer of data on and off the GPU (UMA).

If you are an Affinity user who works on documents which push the limits of your current device, these new MacBook Pro units appear to be a very good upgrade choice.

Especially as the GPU isn’t the only big win here - the “Vector (Multi CPU)” score in the #M1Max is the highest we have ever measured (for Affinity Designer users), as is the “Combined (Single GPU)” score (for Affinity Publisher, by some margin).

@JamesR_Affinity

Affinity 1.10.3 has just been released. It includes numerous, significant optimisations for these new devices and a super-secret extra page of magnificent XDR-only samples, courtesy of our very own @JamesR_Affinity. Try loading them up in a dimly lit room!

Thanks for reading this (huge!) thread. I’ll be back later in the week to talk a bit about these XDR displays and what they mean for photo editing.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Andy Somerfield

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!