Finland was a key tech player 20 years ago: We invented SSH and IRC protocols. Nokia was EUs most expensive company, selling more phones yearly than Apple and Samsung sell today combined. We invented the OS that runs most internet servers today. Nokia failed and Linux is free...
Finland has some new successes: Wolt being the biggest EU food delivery service, Oura being the first health ring and Silo AI being one of EUs biggest AI companies. Wolt got sold to Doordash ($3.5B), Silo AI got sold to AMD ($665M). Oura is still a $11B Finnish company.
We also had a Finnish Facebook before Facebook. Irc Galleria was used by all young adults when I was studying at university. Most girls I knew used it, which should have told investors a lot. But they never made it international.
It's sad that Finland could still be the internet superpower, but Finnish companies made the wrong moves and next wave of Finnish companies got sold. And the only remaining big things like SSH and Linux are free. And Oura is heavily challenged by Apple and Samsung smart rings.
IRC lost to MSN Messenger. The silver lining is that Netscape wasn't Finnish. That would have made perfect sense :D
Finland also had a GPU company. It got sold to ATI -> AMD. It was an EDRAM based design. Xbox 360 devs might still remember it. It was later sold to Qualcomm (Adreno). As we all know Adreno is an anagram of Radeon.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I have realized that there's not that many people out there who understand the big picture of modern GPU hardware and all the APIs: Vulkan 1.4 with latest extensions, DX12 SM 6.6, Metal 4, OpenCL and CUDA. What is the hardware capable of? What should a modern API look like?
My "No Graphics API" blog post will discuss all of this. My conclusion is that Metal 4.0 is actually closest to the goal. It has flaws too. DX12 SM 6.6 doesn't have those particular flaws, but has a lot of other flaws. Vulkan has all the flaws combined, with useful extensions :)
Of course WebGPU doubled down on Vulkan's design mistakes. Bind groups are immutable and there's no escape hatches for dynamic bindings. No persistently mapped GPU memory. And a brand new shader language without 64-bit pointer support.
The past decades have been a wonderful time for gamers+devs. The biggest chips, using the latest nodes and trillions worth of R&D, were all targeted at gaming. Now, those chips are needed by professionals (AI). We'll never see a big die GPU at a reasonable price point anymore :(
The fun lasted for a very long time, but it's over in both CPU and GPU side. The biggest CPU and GPU dies are no longer designed for gamers. Top end Threadripper costs over 10k$ today. Top end Nvidia B200 costs over 30k$. Few generations ago top tier HW was targeting gamers :(
AMD no longer produces big-die GPUs for gamers. Nvidia has a low-volume 2500$+ Halo product. But it's much smaller than Nvidia's B200 GPU, which has two glued dies, each slightly bigger than RTX 5090. Chiplet GPUs like Threadripper are coming. Gaming GPUs limited to few chiplets?
Unit tests have lots of advantages, but cons are ignored:
- Code must be split to testable parts. Often requiring more interfaces, which add code bloat and complexity.
- Each call site is a dependency. Test case = +1 dependency. Added inertia to refactor and throw away code.
...
- Bloated unit test suites taking several hours to execute. Slows down devs and causes merge conflicts as pushes are delayed.
- Unstable tests randomly failing pushes.
- Unit test maintenance and optimization needed to keep tests manageable. Otherwise developer velocity hurts.
It's crucial to make your unit tests fast. Don't load files from disk and definitely don't do network requests. Embed data (bin->hdr tool for example). If your whole test suite runs in <10 seconds, then you are golden. But writing good optimized tests like this takes effort.
When you split a function to N different small functions, the reader also suffers multiple "instruction cache" misses (similar to CPU when executing it). They need to jump around the code base to continue reading. Big linear functions are fine. Code should read like a book.
Messy big functions with lots of indentation (loops, branches) should be avoided. Extracting is a good practice here. But often functions like this are a code smell. Why do you need those branches? Why is the function doing too many unrelated things? Maybe too generic? Refactor?
There's a rule of thumb that you write separate code for each call site until you have repeated yourself 3 times. Then you merge these together. But people often forget the opposite: You have to split a function if the call site requirements change. Don't add more branches!
WebGPU CPU->GPU update paths are designed to be super hard to use. Map is async and you should not wait for it. Thus you can't map->write->render in the same frame.
wgpuQueueWriteBuffer runs in CPU timeline. You need to wait for callback to know buffer is not in use.
Thread...
Waiting for callback is not recommended in web and there's no API for asking how many frames you have in flight. So you have to dynamically create new staging buffers (in a ring) based on callbacks to use wgpuQueueWriteBuffer safely. Otherwise it will trash data used by GPU.
You are not allowed to map or wgpuQueueWriteBuffer a different region of a buffer used by any GPU frame in flight. You need entirely different buffer.
Refactored our CommandBuffer interface to support compute. Final result:
A compute pass contains N dispatches, just like a render pass contains N draws (split into areas = viewports).
Renderpass object is static (due to Vulkan 1.0). Compute has dynamic write resource list.
This is how you would use the API to dispatch a compute pass with a single compute shader writing to two SSBOs.
This new API requires one virtual function call per pass, which is not a problem. The passed data commonly lives in the stack or is temp allocated (frame bump allocator). No copies (span is just ptr + size). And initializer lists (if used) live in caller stack.