Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Sebastian Aaltonen

@SebAaltonen

May 2 • 7 tweets • 2 min read Twitter logo

Read on Twitter

Managed to generate binding ids for the generated GLSL shader for GLES3/WebGL2 using SPIRV-Cross API.

GLES doesn't have sets, so I must generate a flat contiguous range of binding ids per set and store the set start index. Runtime binds N slots at a time (bind groups).

I also must dump a mapping table for combined samplers in the shader. Our renderer has separate samplers and images.

Our textures and bind groups are both immutable. So I can just store 2x GLint per combined sampler. This is 64 bits per combined sampler. Easy to offset allocate them all in a big buffer. Bind group has start offset and count.

GLES doesn't have generic buffer bindings like modern APIs. Have to bind different types of buffers separately. That's a bit annoying. We need to have a bit of metadata also in the buffer bindings to be able to bind them correctly.

I am confident I can store bind group bindings in a tight space and write optimal binding code. But it's not going to be as fast as Vulkan and Metal, since GLES requires separate binding call per resource.

But it should be faster than traditional renderers requiring multiple software command buffer calls. We just write one uint32 to our software command buffer when we bind a group, even in GLES. And the group likely fits in one cache line. It's basically 100% GLES driver overhead.

NOTE: In the original image, there's the same binding index for the UBO and the first texture. This is intentional. GLES binding indices are type specific. Vulkan and Metal instead have shared binding slots for all resources.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SebAaltonen

Sebastian Aaltonen

@SebAaltonen

May 3

@NOTimothyLottes

Splatting gaussians instead of ray-marching. Reminds me of particle based renderer experiments. Interesting to see whether gather or scatter algos win this round.

@NOTimothyLottes thoughs?

https://twitter.com/alexjc/status/1653462396042719233

@NOTimothyLottes

@NOTimothyLottes Translation: Gaussians = stretched particles. Research papers need fancy terminology :)

@NOTimothyLottes

@NOTimothyLottes Seems that they do compute shader binning and raster for the particles.

This is the original paper about that technique (that we also implemented 10 years ago):
slideshare.net/DevCentralAMD/…

Read 6 tweets

Sebastian Aaltonen

@SebAaltonen

May 2

C++20 designated initializers, C++11 struct default values and a custom span type (with support for initializer lists) is a good combination for graphics resource creation:

Declaring default values with C++11 aggregate initialization syntax is super clean. All the API code you need is this struct. No need to implement builders or other code bloat that you need to maintain.

C++20 span type doesn't support initializer lists, so you have to create your own. This is because initializer list life time is very short. Easy to use a dead list. I use "const &&" in the resource creation APIs to force a temporary object.

Read 4 tweets

Sebastian Aaltonen

@SebAaltonen

May 2

https://twitter.com/NOTimothyLottes/status/1653151907710480395

This applies to pretty much every GPU out there.

I haven't yet talked much about the GPU shader optimizations I plan for HypeHype. Heavy pass merging in post process stack is one of them.

Thread...

https://twitter.com/NOTimothyLottes/status/1653151907710480395

I was talking about the new DOTS hybrid renderer GPU persistent data model 2 years ago at SIGGRAPH. We calculated object inverse matrices in the data upload shader, because that was practically free. ALU is free in shaders that practically just copy data around.

On mobile memory bandwidth is a big bottleneck, and using it wastes a lot of power. Thus I prefer to pack my data and unpack in shader. That's usually just a few extra ALU, but you get big bandwidth gains. Performance improves and perf/watt improves.

Read 6 tweets

Sebastian Aaltonen

@SebAaltonen

Apr 22

Let's design a fast screen tile based local light solution for mobile and WebGL 2.0 (no compute). Per-object light list sounded good until I realized that we have a terrain. Even the infinite ground plane is awkward to lit with per-object light list.

Thread...

No SSBOs. Uniform buffers are limited to 16KB (low end Android limitation). Up to 256 lights visible at once. Use the same float4 position + half4 color + half4 direction + cos angle setup that handles both point lights and directional lights. 32B * 256 lights = 8KB light array.

In addition to the light array we have a screen space light visibility grid. uint4 (16 bytes) per element as that's the minimum alignment for UBO arrays. If we use 64x64 tiles we fit the light grid to a 16KB UBO on all mobile resolutions.

Read 15 tweets

Sebastian Aaltonen

@SebAaltonen

Apr 22

I am implemented practically all the possible local light rendering algorithms during my career, yet I am considering trivial per-object list list for HypeHype.

Kitbashed content = lots of small objects. Granularity seems fine.

Thread...

Setup all the visible local light source data into a UBO array at beginning of the render pass. For each object, uint32 contains four packed light (8 bit) light indices. In the beginning of the light loop, do binary AND to take lowest 8 bits and shift down 8 bits (next light).

This is just a single extra uint per draw call. Setup cost is trivial. Assuming of course that it's fine to limit the light count to 4. We can of course use multiple uints if we want 8 or more lights per object. Not a problem.

Read 27 tweets

Sebastian Aaltonen

@SebAaltonen

Apr 21

The high level agenda of my presentation looks currently like this. Each of the main topics have a lot of sub-topics of course.

If you find anything missing that you would want to hear about, please reply in the thread.

Correction: Backend doesn't process or setup data

The platform specific backend code just passes handles and offsets around, so that the data provided directly by the user land code is visible in the shaders. Zero copies and no backend refactoring when data layout changes.

IMPORTANT: The scope of this presentation is the low level gfx platform abstraction. The higher level rendering pipeline / algorithm code is out of the scope. I will be talking about that later of course. And that presentation is going to have a lot of pretty pixels too.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Sebastian Aaltonen

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @SebAaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!