Post

Jarkko Lempiäinen

@JarkkoPFC

Apr 17 • 40 tweets • 6 min read • Read on X

Local lights are now live in HypeHype! 🎉

I developed a new stochastic lighting algorithm — from concept to prototype to production — in just 6 months. Super proud of this one! [Details in thread 🧵]

Here's a walkthrough of a game I relit with local lights

Still ongoing: denoising and optimization. The goal is fully dynamic, fixed-cost local lighting with shadows — running efficiently on a wide range of mobile devices. [1/39]

HypeHype () is a UGC platform with casual creators (even kids!). They can flood the scene with lights, and it has to just work — no perf drops, no ugly artifacts. A big technical challenge! [2/39]hypehype.com

For context, Unity HDRP uses clustered lighting with 64 lights max per 16x16px tile. More lights = tiling artifacts + higher GPU cost. That’s a no-go for us. [3/39]

We can’t rely on pro lighters working around lighting limitations. Our lighting must be seamless for non-technical users — minimal tuning, great results by default. [4/39]

Games must run decently on everything — from $100 Android phones to high-end PCs. A game made on a flagship iPhone must still play well on budget devices. [5/39]

Lighting also needs to look consistent across devices. Dropping lights could break gameplay for some random game. We could limit light count upon authoring — but where’s the fun in that? 😄 [6/39]

We explored ReSTIR, ReGIR, and similar ideas — but none were mobile-friendly enough. Definitely no ray tracing 😄 So we built our own high-performance stochastic lighting solution tailored to our needs. [7/39]

The algorithm starts by selecting 16 lights per big-tile using Weighted Reservoir Sampling (WRS) with a basic PDF, evaluated via stratified sampling over the tile area. [8/39]

Here’s what the big-tile reservoir texture looks like. Kinda boring, but each 4x4 block in this single-channel image represents a big-tile’s light reservoir. [9/39]

This big-tile WRS uses "without replacement" sampling. This helps to ensure a diverse initial light set — important for the quality of the next resampling step. [10/39]

Next, we resample 1–4 lights per small-tile from the big-tile reservoirs using a more complete PDF (incl. shadow term). This gives high-quality light samples. [11/39]

Here’s the small-tile sample texture (4 samples in RGBA). Currently using 4 spp due to lack of denoising — later we may drop to 1 spp with denoising for perf. [12/39]

Each small-tile lights 64 quads (256px). This amortizes resampling cost and improves wave coherence since all pixels in a tile share the same light set. [13/39]

We built this to run entirely in pixel shaders — to take advantage of framebuffer compression and keep bandwidth low, which is important on mobile. [14/39]

That said, reservoir textures are small and somewhat random, so not sure how much FBC helps there. Compute shader variants with some optimization opportunities are on our roadmap. [15/39]

Currently, we support only point and spot lights, but more types could be added. If VGPR pressure rises, we may classify tiles by light type to improve occupancy. [16/39]

A challenge with small-tile stochastic sampling is correlation artifacts (tiling). We mitigate this by interleaving samples between adjacent tiles using Gaussian-Poisson distribution. [17/39]

To further decorrelate, we offset the interleaved tiles across frames. This helps with temporal accumulation like TAA. [18/39]

No denoiser yet — TAA is next. We’ll see how far we get with that "for free" before introducing a dedicated spatio-temporal denoiser. [19/39]

For shadows, we support static & dynamic shadows for point and spot lights. Shadows are ON by default, so even casual users get nice lighting out of the box. [20/39]

Point lights use cubemaps remapped to octahedral space. Spot lights use regular shadow maps. All shadows share a fixed-size atlas, so it's important to use the space efficiently. [21/39]

Here’s the shadow map atlas — a mix of shadow map sizes updated dynamically as lights and the camera move. 16bpp single-channel texture to optimize memory usage & bandwidth. [22/39]

Shadow map size is based on light's distance to camera and luminous flux. Maps are allocated/resized/deallocated as needed based on property changes. [23/39]

We reserve a budget for shadow map updates per frame and spread the updates over multiple frames if necessary. [24/39]

For static shadow maps we render only static objects and push them to an update queue only when needed (e.g. upon movement, rotation, camera thresholds, etc.) [25/39]

Dynamic shadows are opt-in. When enabled, both static and dynamic objects are rendered into the map. They are prioritized higher and updated in round-robin based on update age. [26/39]

If too many dynamic shadows are queued, we always squeeze in one static shadow update per frame to avoid starvation. [27/39]

Because shadow updates may be delayed, we use the shadow map’s capture position instead of the current light position for shadow evaluation. [28/39]

Once the shadow atlas is updated, shadow terms are evaluated in a separate quad-resolution deferred shadow pass before the lighting pass. This deferred approach also simplifies the lighting shader. [29/39]

We compute shadow terms per small-tile quad (64 quads). With a fixed number of samples, we store the terms in a fixed-size texture matching the small-tile light samples. [30/39]

We also evaluate IES light profiles during the shadow term pass. Here’s what the shadow term texture looks like — terms stored in RGBA channels. [31/39]

To reduce wave divergence in shadow evaluation, we order interleaved small-tiles in 8x8px squares — helping GPUs pack executions into coherent waves. [32/39]

Deferred shadows = simpler lighting shader & reduced VGPR pressure. Also avoids divergence during lighting because different shadow sampling per light, since shadow terms are precomputed. [33/39]

We’ll still profile whether the extra memory + pass cost is worth it vs shader complexity reduction and quad-rate evaluation. [34/39]

For shadow filtering, we currently use both 4x PCF (gather-based) and stochastic PCF. This is quite efficient requiring single gather per term. Noise is noticeable, but TAA/denoising should smooth it out. [35/39]

We’re exploring PCSS for contact-hardening shadows and screen-space ray-marched shadows for fine detail. With deferred shadows at quad-rate, these are potentially viable on mobile as optional quality upgrades. [36/39]

Finally, we run the lighting pass — reading small-tile sample & shadow term textures, evaluating the BRDF per sample, and applying weights for unbiased lighting. [37/39]

Since this runs in pixel shaders, there’s some light-type divergence — but it’s minimal with only punctual lights. Compute-based variant would eliminate the divergence entirely (more relevant for area lights, etc.). [38/39]

And here’s what the final composed image looks like after the lighting pass. [39/39]

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Jarkko Lempiäinen

Try unrolling a thread yourself!

More from @JarkkoPFC

Jarkko Lempiäinen

Jarkko Lempiäinen

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!