Profile picture
Fabian Giesen @rygorous
, 21 tweets, 4 min read Read on Twitter
Since @scalzi ranted about it and it's A Thing right now, for your entertainment, some of the ways I (professional game tech programmer of ~10 years, 8 years of which I've been at a game middleware company) have inadvertently caused multi-hundred-MB patches for trivial changes:
1. Wrote a terrain engine with (I still think) individually sensible design decisions!
Some of the level data was used on the CPU too, but a lot of it was GPU-side only so it ended up getting atlased into a couple of large textures. (Terrain is usually broken into small chunks;
a set of textures per chunk would be too much overhead; so we group them together). Problem: you edit the map and change tessellation levels _somewhere_, and that changes what gets packed into the atlas in what order. End result, even tiny edits in some corner of the map can
cause ripple effects that cause all of the textures for some decent-sized chunk of the world to end up different, and was (at the time, in 2008) easily ~10MB of new data every time it happened.
2. Lightmappers (I)! The terrain in question also had light maps. Our light mapper used some randomized algorithms (outside the scope of this thread, but that's A Common Thing for this task, for Not Stupid But Complicated Reasons).
Every time you run such a light mapper, even on the same input data, all the light maps end up different. You can try to de-randomize or use pseudo-randomness in a way that at least gives the same results if you lightmap the same thing twice,
but if there's actual changes to the level somewhere, it's pretty hard to avoid getting tons of changes everywhere.
3. Lightmappers (II). Calculating light maps takes a while and usually involves clusters of machines and some GPU-side work. Generally, not all machines in your cluster are the same; there are older and newer ones, and some have different GPUs or driver versions.
Work gets farmed out to whatever cluster machine currently has spare time. So even if your light mapping has no randomization, the same chunk of world might get run on different machines with different GPUs and different shader compilers on subsequent runs.
That causes a ripple effect where again everything slightly changes (at the bitwise level) even when the result ends up visually indistinguishable to what it was before.
4. Optimized the decoder for an audio codec.
Wait, what? OK, this takes a bit more explanation. Lossy audio compression involves running some transforms over the source data to get it into a form that's easier to compress, and then deciding what's important and allocating bits.
A lot of these transforms are ultimately based on FFTs, and that's what I was optimizing. So I replaced the old FFT code with something a good deal faster (based on a slightly different algorithm), which is yet another of those things that makes Everything Slightly Different,
But Not In A Way You'd Notice. The goal was to make the decoder faster. But the encoder uses the same code! (Because FFTs are a general building block.) So the encoder changed too, again producing numerically slightly different but indistinguishable-to-human-ears results.
This in turn ended up with the next asset build at a customer site producing changes in nearly every single compressed audio file, from a few off-by-1 differences in about 1 out of every 1000 transform coefficients.
Sadly after you bit-pack and compress those kind of changes, the delta patch for the audio files typically ends up being essentially a rewrite. "Oops."

Bonus fun: for this reason, the _decoder_ can use things like newer SSE kernels or AVX2 with FMAs, but the encoder can't.
Why? Again, clusters. If some of the machines in your cluster have AVX2 and some don't, and the encoder uses AVX2 when available but produces essentially-equivalent-but-not-bitwise-identical results if so, that's effectively a non-deterministic build.
...point of this long thread being, there's kind of a few running themes here.
Namely, the combination of:
1. lots of data that _needs_ to be batched together for good IO patterns or efficient CPU/GPU consumption
2. it being Actually Hard to make 1) stable wrt typical asset changes
3. many build steps on lots of assets that you need to farm out to a cluster
if you want useful iteration times
4. optimized versions of build steps that want to use GPUs or newer CPU features but end up causing some amount of build non-determinism when they do because machines and drivers aren't identical
5. some of this being reliant on randomized algs
makes it A Lot Harder Than You Would Think to get deterministic, small, proportional-to-the-amount-of-changes patch files, even if you're trying to.
(And even if you're not being Gratuitously Non-Deterministic For No Good Reason in the way you generate pack files etc.)
</thread>
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Fabian Giesen
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!