Profile picture
Sebastian Aaltonen @SebAaltonen
, 8 tweets, 2 min read Read on Twitter
This innocent looking 10 lines of code generates more than 600 GPU instructions... oops :)
It reads 3x3 neighborhood from groupshared memory and combines some bit masks to setup indirect dispatch tile coordinates (later in the shader). Time to refactor it...
HLSLcc GLSL output looks a bit unoptimal. IMHO leaning a bit too much for the IHV compiler to massage the code. My bit extract code does way too much stuff in this case (compare > 0 afterwards). I could SWAR through this mess :)
I have 3 counters packed in one uint (10+12+10 bits). I can binary OR the packed counters directly (SWAR style), because binary OR will only be >0 if any of the combined masks are >0. This of course messes up the counter values, but >0 is the only case I care about.
Yay. 8x less instruction now for that piece of code. That shader become 0.4 ms faster :)
You could also SWAR add the packed counters, but you risk overflow. Binary OR acts like a carryless per bit adder. Guaranteed to have no overflow. And guaranteed to handle the case of != 0 correctly.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Sebastian Aaltonen
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!