I've developed a technique to have in @DirectX12 something nearly as efficient as native specialization constants (SC) would be.

First of all, @VulkanAPI (SPIR-V) feature SCs, which are as powerful as compile-time constants in terms of optimization (branch pruning, [...]
informed loop unrolling, etc.) and flexible like uniforms in that they can be set at runtime (not with a cost that uniforms don't suffer from, but still after the shader has already been compiled to SPIR-V).

Unfortunately, D3D12 doesn't provide anything like that. [...]
Consider this HLSL pixel shader.

On lines 1 and 2 we have a boolean and an integer, respectively, that we would like to be different across multiple versions of the shader.

So far, the only option was to programmatically patch the HLSL code and compile every time. [...]
This method makes it possible to act directly on the DXIL representation of a shader so there's no need to go all the way back to HLSL and pay the cost of parsing a high level language again and again.

1. HLSL PATCHING [...]
First, replace every `define` by a `volatile` integer variable and assign correlative values, starting with 0, like in the picture.
Our former BLACKOUT define becomes `sc_blackout` (our SC 0).
Our former BRIGHTEN_PASSES define becomes `sc_brighten_passes` (our SC 1). [...]
By using `volatile` on those variables we can be sure that DXC can't optimize out anything about them, so they will stay even at `-O3` in the generated DXIL. This is very important. Optimizations will happen later. [...]
It's also important that the HLSL at hand doesn't use `volatile` itself, or the process will get confused and explode.

2. HLSL TO DXIL
Compile normally with DXC to get a DXBC container. [...]
3. DXIL PATCHING
The goal of this section is to prepare our DXIL so it's ready to be quickly patched for our wanted SC values anytime we want to create a new pipeline with them. Just in case, all in this section must be only only once per shader and can be cached. [...]
For this patching, since DXIL is nothing else than LLVM IR bitcode, we will use some pieces of LLVM (or, more precisely, of the slightly customized LLVM present in DXC). [...]
3.1. PREPARATION
- Find the DXIL chunk in the the DXBC container.
- Extract the bitcode stream (skip some header, etc.).
- Unserialize the stream into functions-blocks-instructions. [...]
3.2. COLLECTION OF ALLOCATIONS OF VOLATILES
We have to make a database of the constants' allocations (where our 0 and 1 live). Steps:
- Locate all the volatile stores, which look like `store volatile i32 9, i32* %59, align 4`. [...]
- For each of them, follow the pointer to the source value (%59 in this example) to find the corresponding 'alloca', which looks like this: `%63 = bitcast i32* %59 to i8*`.
- Therefore, so to speak, we know that our SC lives at %63. (That's why we needed [...]
to assign correlative values, to be able to map the store instructions to its corresponding SC.)

(The 9 in the example would correspond to SC 9 if we would have had that many. For us, they are our 0 and 1.)
3.2. PATCHING WITH CONSTANTS + CLEANUP
We'll change the code so it loads constants instead of the volatiles and then get rid of everything that is no longer relevant.

- Locate all the volatile loads which are users (LLVM jargon) of each SC location we collected earlier, [...]
which look like `%590 = load volatile i32, i32* %61, align 4`.
- Replace each (the %590 in the example) by a constant with a special value, computed as 0x45678900 + the SC index the corresponding SC whose location we are searching for (our 0 or 1). [...]
We can call this a sentinel value and we'll se later the rationale for that.
- Cascade remove any instruction that was referring to no longer existing instructions. [...]
3.3. WRAPUP
- Serialize the functions-block-instructions into LLVM bitcode again.
- Fixup the header of the chunk and put the new code in.
- Rebuild the DXBC container. [...]
4. SC BIT OFFSET DATABASE
During the serialization to DXIL, a few changes done to our copy of portions of LLVM have done a good thing for us.

Namely, whenever a constant muliple of 0x45678900 (the sentinel number from earlier) is being streamed out to the bitcode flow, [...]
it has recorded a pair into a map.
- The key is the value of the constant mod the magic value, which is our SC id (our 0 or 1, again).
- The value is the count of bits written to the stream by now, nothing else that the bit offset where our SC lives in the bit stream. [...]
Another kind thing from our patch to DXC's LLVM is that the value of the constant is replaced by a known constant, fixed value (1000001b) that we can later compare against to verify we're about to patch the right spot. [...]
(To dodge for now some more complex situations caused by LLVM's VBR, we are constraining our values to 7 bits.)

5. SETTING PROPER SC VALUES
This is again something that has to be done only once and be cached. [...]
We have to do a first full round of tampering the cases of 1000001b with the real default values we want in our SCs. That's easy now we have our SC bit offset database.

Later we can tamper again for any SC we want to set a new value to, which implies the next steps again. [...]
6. DXIL SIGNATURE
After any change to the DXIL bit stream, the DXBC container has to be signed again. (Note that the first compilation via DXC would have already signed it if DXIL.dll is in the neighborhood, but that signature is no longer valid.) [...]
7. PIPELINE CREATION
The new patched DXBC container is suitable to be used to ceate a new render pipeline.

Something great to know is that the GPU driver performs further optimizations to those done by DXC. [...]
Therefore, once our DXIL bitcode is rid of volatile variables and is just dealing with constants, the GPU driver is able to make smart decisions. Thanks to the AMD @Radeon dev tools, I've checked that the ISA assembly is indeed optimized depending on the value of the SCs. [...]
(I guess the same happens on @nvidia and others.)

In the images you can see the diff of the GCN assembly listings of the same exact shader in whose original high level source code was doing or skipping a multiplication based on the value of a SC.
8. FINAL WORDS
You may want to have a look to some code implementing this technique. Please stay tuned to #GodotEngine during the following weeks, because something special may happen!

Lastly, let me go over the limitations of this method: [...]
- Values can't take more than 7 bits (therefore, integers are limited to 127). It's solvable with some extra work against the VBR encoding, though.
- Floating point SCs are not supported. It may be possible, but I haven't strived for it. [...]
- Legit values mathing the sentinel may appear in regular shader code. The base sentinel value has been chosen carefully to keep the chances to a minimum, but it's theoretically possible. [...]
- The DXIL container must be signed, which is not ideal. That's a cost that native SCs wouldn't have to pay. In any case this is still more efficient than going all the way back to HLSL.

[EOF]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pedro J. Estébanez™

Pedro J. Estébanez™ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(