Jarkko Lempiäinen Profile picture
Apr 17 40 tweets 6 min read Read on X
Local lights are now live in HypeHype! 🎉

I developed a new stochastic lighting algorithm — from concept to prototype to production — in just 6 months. Super proud of this one! [Details in thread 🧵]

Here's a walkthrough of a game I relit with local lights
Still ongoing: denoising and optimization. The goal is fully dynamic, fixed-cost local lighting with shadows — running efficiently on a wide range of mobile devices. [1/39]
HypeHype () is a UGC platform with casual creators (even kids!). They can flood the scene with lights, and it has to just work — no perf drops, no ugly artifacts. A big technical challenge! [2/39]hypehype.com
For context, Unity HDRP uses clustered lighting with 64 lights max per 16x16px tile. More lights = tiling artifacts + higher GPU cost. That’s a no-go for us. [3/39]
We can’t rely on pro lighters working around lighting limitations. Our lighting must be seamless for non-technical users — minimal tuning, great results by default. [4/39]
Games must run decently on everything — from $100 Android phones to high-end PCs. A game made on a flagship iPhone must still play well on budget devices. [5/39]
Lighting also needs to look consistent across devices. Dropping lights could break gameplay for some random game. We could limit light count upon authoring — but where’s the fun in that? 😄 [6/39]
We explored ReSTIR, ReGIR, and similar ideas — but none were mobile-friendly enough. Definitely no ray tracing 😄 So we built our own high-performance stochastic lighting solution tailored to our needs. [7/39]
The algorithm starts by selecting 16 lights per big-tile using Weighted Reservoir Sampling (WRS) with a basic PDF, evaluated via stratified sampling over the tile area. [8/39]
Here’s what the big-tile reservoir texture looks like. Kinda boring, but each 4x4 block in this single-channel image represents a big-tile’s light reservoir. [9/39] Image
This big-tile WRS uses "without replacement" sampling. This helps to ensure a diverse initial light set — important for the quality of the next resampling step. [10/39]
Next, we resample 1–4 lights per small-tile from the big-tile reservoirs using a more complete PDF (incl. shadow term). This gives high-quality light samples. [11/39]
Here’s the small-tile sample texture (4 samples in RGBA). Currently using 4 spp due to lack of denoising — later we may drop to 1 spp with denoising for perf. [12/39] Image
Each small-tile lights 64 quads (256px). This amortizes resampling cost and improves wave coherence since all pixels in a tile share the same light set. [13/39]
We built this to run entirely in pixel shaders — to take advantage of framebuffer compression and keep bandwidth low, which is important on mobile. [14/39]
That said, reservoir textures are small and somewhat random, so not sure how much FBC helps there. Compute shader variants with some optimization opportunities are on our roadmap. [15/39]
Currently, we support only point and spot lights, but more types could be added. If VGPR pressure rises, we may classify tiles by light type to improve occupancy. [16/39]
A challenge with small-tile stochastic sampling is correlation artifacts (tiling). We mitigate this by interleaving samples between adjacent tiles using Gaussian-Poisson distribution. [17/39]
To further decorrelate, we offset the interleaved tiles across frames. This helps with temporal accumulation like TAA. [18/39]
No denoiser yet — TAA is next. We’ll see how far we get with that "for free" before introducing a dedicated spatio-temporal denoiser. [19/39]
For shadows, we support static & dynamic shadows for point and spot lights. Shadows are ON by default, so even casual users get nice lighting out of the box. [20/39]
Point lights use cubemaps remapped to octahedral space. Spot lights use regular shadow maps. All shadows share a fixed-size atlas, so it's important to use the space efficiently. [21/39]
Here’s the shadow map atlas — a mix of shadow map sizes updated dynamically as lights and the camera move. 16bpp single-channel texture to optimize memory usage & bandwidth. [22/39] Image
Shadow map size is based on light's distance to camera and luminous flux. Maps are allocated/resized/deallocated as needed based on property changes. [23/39]
We reserve a budget for shadow map updates per frame and spread the updates over multiple frames if necessary. [24/39]
For static shadow maps we render only static objects and push them to an update queue only when needed (e.g. upon movement, rotation, camera thresholds, etc.) [25/39]
Dynamic shadows are opt-in. When enabled, both static and dynamic objects are rendered into the map. They are prioritized higher and updated in round-robin based on update age. [26/39]
If too many dynamic shadows are queued, we always squeeze in one static shadow update per frame to avoid starvation. [27/39]
Because shadow updates may be delayed, we use the shadow map’s capture position instead of the current light position for shadow evaluation. [28/39]
Once the shadow atlas is updated, shadow terms are evaluated in a separate quad-resolution deferred shadow pass before the lighting pass. This deferred approach also simplifies the lighting shader. [29/39]
We compute shadow terms per small-tile quad (64 quads). With a fixed number of samples, we store the terms in a fixed-size texture matching the small-tile light samples. [30/39]
We also evaluate IES light profiles during the shadow term pass. Here’s what the shadow term texture looks like — terms stored in RGBA channels. [31/39] Image
To reduce wave divergence in shadow evaluation, we order interleaved small-tiles in 8x8px squares — helping GPUs pack executions into coherent waves. [32/39]
Deferred shadows = simpler lighting shader & reduced VGPR pressure. Also avoids divergence during lighting because different shadow sampling per light, since shadow terms are precomputed. [33/39]
We’ll still profile whether the extra memory + pass cost is worth it vs shader complexity reduction and quad-rate evaluation. [34/39]
For shadow filtering, we currently use both 4x PCF (gather-based) and stochastic PCF. This is quite efficient requiring single gather per term. Noise is noticeable, but TAA/denoising should smooth it out. [35/39]
We’re exploring PCSS for contact-hardening shadows and screen-space ray-marched shadows for fine detail. With deferred shadows at quad-rate, these are potentially viable on mobile as optional quality upgrades. [36/39]
Finally, we run the lighting pass — reading small-tile sample & shadow term textures, evaluating the BRDF per sample, and applying weights for unbiased lighting. [37/39]
Since this runs in pixel shaders, there’s some light-type divergence — but it’s minimal with only punctual lights. Compute-based variant would eliminate the divergence entirely (more relevant for area lights, etc.). [38/39]
And here’s what the final composed image looks like after the lighting pass. [39/39] Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jarkko Lempiäinen

Jarkko Lempiäinen Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @JarkkoPFC

Dec 19, 2021
A thread of some thoughts on PBR of transparent materials. If you still use classic alpha blending (dst=dst*(1-srcAlpha)+srcBSDF) might be worth reading:
Alpha blending does monochromatic (gray) transmittance which is unable to properly model colored transparent materials, e.g. with this model a blue object behind red glass appears darker blue (e.g. srcAlpha=0.5) instead of completely black as it should.
Colored transmittance can be implemented on GPU with fixed-function "dual-source blending" by multiplying dst with the transmittance and adding the evaluated transparent material BSDF to the result: dst=dst*srcTrans+srcBSDF
Read 16 tweets
Aug 2, 2020
My non-graphics project of the summer - building a backyard shed (a thread with pics):
Our old shed was getting in bad shape and my wife had been asking to get new one for long time. My foot went through the floor so that was the final straw to get started :D Image
This was our awesome architectural design for a 12x8ft shed :D The roof was the most difficult thing to figure out since we wanted some overhangs on all sides, but Youtube was useful as always :D Image
Read 39 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(