Huawei's VP of Cyber Security was interviewed by @KauppalehtiFi. He proposed 7 day (80h) weeks with no weekends. He hasn't got a summer vacation in four years. And he is worried that Finland can't compete with China / Silicon Valley without doing the same.
Thread...
I can only talk about programmers here, as I am not familiar with other people. Programmers write bad code when they are stressed out. They can still write a lot of code, but the architecture designs are terrible and there's lot of technical debt.
The only way to make programmers thing about the architecture (bigger picture) is to give them some breathing room. People I know often get their best ideas outside the work (walking the dog, swimming, biking to home, etc). Best ideas never occur during stress.
If the whole team is doing 80 hour weeks regularly, the amount of technical debt will be massive, and it will require a massive amount of people to maintain that code. If you do it like this for many years, it will definitely feel that nobody has time to idle. So many bugs to fix
And nobody has the energy or motivation to do big technical changes to get the company out of this situation. It's a spiral of death. No exit. Everybody things they are doing lot of work, but that's just busy work. With better design, none of this work would be needed.
If you really want good long term velocity, you need to give your programmers breathing room to make good architectural decisions. If you are measuring lines of code written, you are doing it wrong. More lines is almost never better. It just needs more maintenance.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I have a bad motion sickness and this marketing picture makes me feel nauseous. I don't want a car with more screen. If I sit in the back seat, I sit in the middle and stare to the road. Same when I am driving the car. I stare to the road. Give me a projected HUD instead of this.
These big panorama glass roofs on the other hand help with motion sickness. I am glad that many new cars have them. Many new EVs have projected HUDs too. Tesla doesn't yet have it. You need to glance to the screen regularly, which is painful for motion sick people.
Because of my motion sickness I have developed a very smooth style of driving. It was easy to drive like this on other EV's since they have control over regenerative braking balance (between gas/brake pedal). But Tesla removed this in 2021 models. Big red flag for me.
With the resizable BAR support getting more adaptation. Standard swizzle is becoming more important too: docs.microsoft.com/en-us/windows/…
With standard swizzle, you can do the swizzle on CPU side (even store swizzled textures on disk) and write to GPU memory directly without a copy.
I am just wondering how good the standard swizzle support is nowadays. AFAIK only Intel supported this feature in the beginning. What's the current situation? Is it supported on Nvidia and AMD? If yes, is it fast?
If the optimal tiling layout is 5%+ faster than standard swizzled, then there's no point in using it. Just pay the GPU copy cost for better runtime performance. But if the cost is tiny, then simply use RBAR memory for everything :)
Why is it better to get full GPU memory visible from CPU side compared to a small 256 MB region?
Thread...
Traditionally people allocate an upload heap. Which is CPU system memory visible to the GPU.
The CPU writes data there, and the GPU can directly read the data over PCI-E bus. Recently I measured 28 GB/s GPU read bandwidth from CPU system memory over PCI-E 4.0.
The two most common use cases are:
1. Dynamic data: CPU writes to upload heap. GPU reads it from there directly in pixel/vertex/compute shader. Examples: constant buffers, dynamic vertex data...
2. Static data: CPU writes to upload heap. GPU timeline copy to GPU resource.
I am going to implement a depth pyramid based approach first.
Would also like to test the new Nvidia extension that eliminates all pixels of a triangle after the first passed one. This way you don't even need a depth pyramid. Just write to visible bitfield in pixel shader.
New GPU culling algorithm: 1. Render last frame's visible list 2. Generate depth pyramid from the Z buffer 3. Do a 2x2 sample test for each instance using gather (refer to my SIGGRAPH 2015 presentation) 4. Write newly visible instances also to buffer B 5. Render visible list B
What should I implement next to my Rust Vulkan prototype?
It has plenty of occlusion potential, even though it's a sparse asteroid field of 1 million instances. Should be able to cull 90%+ easily...
I need the occlusion culling for efficient rendering of the sparse volume. Otherwise the brick raster results in overdraw. However the backfaces of SDF bricks terminate the root finding immediately as the ray starts from the inside. Could early out normal calc too...
It was about image kernels and their memory access patterns. Filled with GCN architecture specifics, but the most noteworthy detail was the LDS sliding window algorithm.
Thread...
Blur kernels are very popular, and the most annoying part about writing one is how you avoid fetching the neighborhood again and again. Tiny changes in execution order can have massive effect in cache utilization. The problem is especially tricky in separable X/Y gaussian blurs.
Naive separable gaussian blur fetches a long strip along X axis. Each pixel does the same. Pixel Y and Y+n share zero input pixels with each other. Pixels along the X axis share inputs. But if the kernel is wide enough it's hard to keep all of that data reliably in caches.