Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SebAaltonen

Sebastian Aaltonen

@SebAaltonen

Feb 7

Currently I invalidate my handles (bump generation index) when the gfx resource is destroyed. The resource itself is put into a delete queue waiting until GPU has finished that frame.

I am considering deferring the handle invalidation too...

Currently I push all passes and draw commands to big queues and these queues are processed later. Will be done in threads. It's convenient to be able to delete resources (render targets, passes, textures, buffers) before the handles are dereferenced.

Currently you can't delete the resources (handles) immediately, as the deferred rendering will try to deref the handles and it gets a null since generation bits don't match. But the resource is still there, because of deferred deletion.

Read 5 tweets

Sebastian Aaltonen

@SebAaltonen

Feb 6

This is what happens when you try to do two refactorings at the same time. I was lazy and tried to save some time :)

The following Metal object is being destroyed while still required to be alive by the command buffer 0x134808e00: [...] label = CAMetalLayer Drawable

Everything works fine on Metal. It's just that Vulkan backend (running here on top of MoltenVK) broke. I also merged all my Vulkan changes that made it run on Android phones AND refactored the draw stream bindings. You get what you ask for :)

198 changed files. This is fine :)

Read 7 tweets

Sebastian Aaltonen

@SebAaltonen

Feb 6

Refactored my main display pass and present APIs.

Now I have a separate API for starting the main display pass. It returns the swap chain render pass handle and a command buffer handle.

Thread...

On Metal starting the main display pass acquires the drawable. Acquiring the drawable on Metal causes a CPU stall if the swap chain buffer is not available, so you need to do it as late as possible. This way offscreen passes can be pushed to GPU before the CPU stall.

Presenting the display is also a command in Metal. It needs to be pushed to the command buffer. Now I don't have a separate present API anymore. The main command buffer submit will write a present command at end of the command buffer automatically.

Read 6 tweets

Sebastian Aaltonen

@SebAaltonen

Jan 23

Metal fences now work. No corruption anymore.

But my implementation is dead simple. I have a fence between all render passes. I allow next render pass to run vertex shaders before the previous pass finishes. This is the biggest optimization you want on mobile TBDR GPUs.

But this is not yet shippable. Currently HypeHype doesn't sample render target textures in vertex shader, but somebody could implement a shader like that and my dead simple fence implementation would fail.

I guess it's time for another uint64 bitfield for used render targets. Store that in bind groups and render passes. I could have an array of 64 fences. Update a fence at end of render pass for each RT that has store op != don't care. Wait when used next time.

Read 4 tweets

Sebastian Aaltonen

@SebAaltonen

Jan 23

Based on feedback, it seems that nobody is complaining about the allocator algorithmic details or code clarity, but people are complaining about these two delete[] calls :)

github.com/sebbbi/OffsetA…

Not going to include std::unique_ptr header to remove two lines of trivial code.

Yes, I know that I must implement a move constructor. That's extra 4 lines of code.

But this code isn't even tested yet. Going to do that once I have written a test suite first and fixed all the bugs. Those things have to wait.

I have bad experienced using single header libraries that lean a lot on std headers. Magic enum, phmap and similar add a massive cost to the compile time due to the dependency to complex std headers. I made HypeHype compile time 2x faster last summer by cutting std dependencies.

Read 5 tweets

Sebastian Aaltonen

@SebAaltonen

Jan 23

This is the memory allocator comparison paper I mentioned in my threads. My allocator should be similar to TLSF since I also use a two level bitfield and floating point distribution for bins. But I don't 1:1 tie bitfield level with float mantissa/exp.

researchgate.net/publication/23…

I haven't read the TLSF paper or implementation, so they might have some additional tricks that I didn't implement. Also my allocator doesn't embed the metadata to the allocated memory, since my allocator is not a memory allocator, it's an offset allocator. No backing memory.

This means that I must use a separate array to store my nodes and freelist. Which is both a good thing and a bad thing. The good thing is that you can use this to allocate GPU memory or any other type of resource that requires sequential slot allocation.

Read 7 tweets

Share this page!

Sebastian Aaltonen

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @SebAaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Sebastian Aaltonen

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!