First rough test results on Android. New vs old renderer. 4000 draw call scene.

This is still lacking most data model optimizations. They are coming later...

Adreno 610: 46 fps -> 60 fps (+30%)
Mali G57: 43 fps -> 45 fps (+5%)
PowerVR GE8320: 19 fps -> 48 fps (+152%)
Iteresting to see such small gains on G57 (Valhall 1).

Valhall 3 brought a big frontend update which improved Vulkan performance:
community.arm.com/arm-community-…

Have to profile to see the bottleneck.
The GE8320 Z-acne issue is likely caused by 24 bit Z buffer + some missing Z bias state in my code. Which causes shadow maps to Z acne with the surface. Will investigate that. GE8320 doesn't support 32+8 Z+stencil. Have to use 24+8.
Will do proper profiling later with actual GPU tools. I just got Android working today, and running initial tests to see that things work correctly. I am glad that there's no perf regressions on low end.
Fixed the GE8320 issue. It was actually interpolation precision issue with material ID. Fixed it and it renders correctly. No perf impact.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sebastian Aaltonen

Sebastian Aaltonen Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SebAaltonen

Feb 7
Currently I invalidate my handles (bump generation index) when the gfx resource is destroyed. The resource itself is put into a delete queue waiting until GPU has finished that frame.

I am considering deferring the handle invalidation too...
Currently I push all passes and draw commands to big queues and these queues are processed later. Will be done in threads. It's convenient to be able to delete resources (render targets, passes, textures, buffers) before the handles are dereferenced.
Currently you can't delete the resources (handles) immediately, as the deferred rendering will try to deref the handles and it gets a null since generation bits don't match. But the resource is still there, because of deferred deletion.
Read 5 tweets
Feb 6
This is what happens when you try to do two refactorings at the same time. I was lazy and tried to save some time :)

The following Metal object is being destroyed while still required to be alive by the command buffer 0x134808e00: [...] label = CAMetalLayer Drawable
Everything works fine on Metal. It's just that Vulkan backend (running here on top of MoltenVK) broke. I also merged all my Vulkan changes that made it run on Android phones AND refactored the draw stream bindings. You get what you ask for :)
198 changed files. This is fine :) Image
Read 7 tweets
Feb 6
Refactored my main display pass and present APIs.

Now I have a separate API for starting the main display pass. It returns the swap chain render pass handle and a command buffer handle.

Thread...
On Metal starting the main display pass acquires the drawable. Acquiring the drawable on Metal causes a CPU stall if the swap chain buffer is not available, so you need to do it as late as possible. This way offscreen passes can be pushed to GPU before the CPU stall.
Presenting the display is also a command in Metal. It needs to be pushed to the command buffer. Now I don't have a separate present API anymore. The main command buffer submit will write a present command at end of the command buffer automatically.
Read 6 tweets
Jan 23
Metal fences now work. No corruption anymore.

But my implementation is dead simple. I have a fence between all render passes. I allow next render pass to run vertex shaders before the previous pass finishes. This is the biggest optimization you want on mobile TBDR GPUs.
But this is not yet shippable. Currently HypeHype doesn't sample render target textures in vertex shader, but somebody could implement a shader like that and my dead simple fence implementation would fail.
I guess it's time for another uint64 bitfield for used render targets. Store that in bind groups and render passes. I could have an array of 64 fences. Update a fence at end of render pass for each RT that has store op != don't care. Wait when used next time.
Read 4 tweets
Jan 23
Based on feedback, it seems that nobody is complaining about the allocator algorithmic details or code clarity, but people are complaining about these two delete[] calls :)

github.com/sebbbi/OffsetA…

Not going to include std::unique_ptr header to remove two lines of trivial code.
Yes, I know that I must implement a move constructor. That's extra 4 lines of code.

But this code isn't even tested yet. Going to do that once I have written a test suite first and fixed all the bugs. Those things have to wait.
I have bad experienced using single header libraries that lean a lot on std headers. Magic enum, phmap and similar add a massive cost to the compile time due to the dependency to complex std headers. I made HypeHype compile time 2x faster last summer by cutting std dependencies.
Read 5 tweets
Jan 23
This is the memory allocator comparison paper I mentioned in my threads. My allocator should be similar to TLSF since I also use a two level bitfield and floating point distribution for bins. But I don't 1:1 tie bitfield level with float mantissa/exp.

researchgate.net/publication/23…
I haven't read the TLSF paper or implementation, so they might have some additional tricks that I didn't implement. Also my allocator doesn't embed the metadata to the allocated memory, since my allocator is not a memory allocator, it's an offset allocator. No backing memory.
This means that I must use a separate array to store my nodes and freelist. Which is both a good thing and a bad thing. The good thing is that you can use this to allocate GPU memory or any other type of resource that requires sequential slot allocation.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(