Some of the level data was used on the CPU too, but a lot of it was GPU-side only so it ended up getting atlased into a couple of large textures. (Terrain is usually broken into small chunks;
Wait, what? OK, this takes a bit more explanation. Lossy audio compression involves running some transforms over the source data to get it into a form that's easier to compress, and then deciding what's important and allocating bits.
Bonus fun: for this reason, the _decoder_ can use things like newer SSE kernels or AVX2 with FMAs, but the encoder can't.
1. lots of data that _needs_ to be batched together for good IO patterns or efficient CPU/GPU consumption
2. it being Actually Hard to make 1) stable wrt typical asset changes
3. many build steps on lots of assets that you need to farm out to a cluster
4. optimized versions of build steps that want to use GPUs or newer CPU features but end up causing some amount of build non-determinism when they do because machines and drivers aren't identical
5. some of this being reliant on randomized algs