In graphics programming we use a lot of awesome sounding names for techniques, which often trigger fantastic mental imagery (as well as actual imagery). Too many to list them all, the top 3 of my favourite ones, in no particular order, probably are: (1/4)
1) "Ambient occlusion": the percentage of rays cast from a point over the hemisphere centred around the surface normal that are not occluded (do not collide with) by geometry. A value of 0 means all rays collide, 1 means none does. (2/4)
2) "Shadow pancaking": project shadowcasting meshes that lie in front of the near plane of a light (and would normally get culled), on the near plane so that they will still cast shadows. Used to enable tightening of the shadow projection volume to increase the resolution. (3/4)
During my years in graphics there have been many great conference presentations but also a few that I found "eye opening" and changed the way I think about and approach gfx programming. My top 3, in no particular order, probably are (1/4):
People starting to learn graphics techniques and a graphics API to implement them may find the whole process intimidating. In such a case there is the option to use a rendering framework that hides the API complexity, and handles asset and resource management. (1/4)
There are quite a few frameworks out there, for example:
Some are closer to the API, some hide it completely. They still offer the opportunity to learn about asset loading, shaders, render states, render targets etc at a more granular level than a full blown engine while allowing the user to focus on the gfx tech implementation (3/4).
Good DM question: "is it better to dispatch 1 threadgroup with 100 threads or 100 groups with 1 thread in each?" The GPU will assign a threadgroup to a Compute Unit (or SM), and will batch its threads into wavefronts (64 threads, on AMD GCN) or warps (32 threads on NVidia). (1/4)
Those wavefronts/warps are executed on the CU's SIMDs, 64/32 threads at a time (per clock), in lockstep. If you have only one thread in the threadgroup you will waste most of the wavefront/warp as they can't contain threads from different threadgroups. (2/4)
The general advice is that the threadgroup should fill at least a few wavefronts/warps, for eg 128/256 threads on GCN. The number also depends on the registers used per thread, to achieve good occupancy, and the need to share data between the threads of the group or not. (3/4)
A mini Importance Sampling adventure: imagine a signal that we need to integrate (sum it's samples) over its domain. It could for example be an environment map convolution for diffuse lighting (1/6).
Capturing and processing many samples is expensive so we often randomly select a few and sum these only. If we uniformly (with same probability) select which samples to use though we risk missing important features in the signal, eg areas with large radiance (2/6).
If the signal are non negative (like an image), we can normalise its values (divide by the sum of all values) and treat it as a probability density function (pdf). Using this, we can calculate the cumulative distribution function (CDF) (3/6).
Question from DMs: "So can the GPU automatically generate a Hi-Z pyramid?"
The confusion comes from a GPU feature often called HiZ (esp for AMD GPUs): for every tile of pixels (say 4x4 or 8x8), the GPU stores a min and max depth value in a special buffer while rendering. (1/4)
Every time a pixel tile, belonging to the same triangle, arrives, the GPU will use the min/max value in that buffer, that corresponds to the tile, to compare it with the min/max depth values of the pixel tile. (2/4)
If for example the min depth of all the pixels in the new tile is larger than the maximum depth stored in the corresponding Hi-Z pixel then the GPU rejects the whole tile. If not, then it updates the min/max value of the HiZ pixel and goes on to process the tile further. (3/4)
I got an interesting question via Twitter today: why would a single, high polycount mesh (>5m tris) render slowly? Without knowing more about the platform and actual use, off the top of my head (thread - please add any potential reasons I have missed):
The vertex buffer may be too fat in terms of data types and/or stores more data than we need. "Smaller" formats (bytes, half, or oven fixed point) may help.
Also using a structure of arrays layout (a different vertex stream per attribute -- position, normal etc) can make it easier to only bind vertex data that we need and increase cache hits.
Whether you are working on PBR, shadows, area lights or GI it always helps having a "ground truth" raytraced image for reference. If you can't make your own, Mitsuba is an easy to use pathtracer that can give good results. mitsuba-renderer.org (thread)
You can control the number of light bounces to emulate a more "traditional" game environment, without an advanced GI solution, to focus on the shape of the shadow or the response of a material to a dynamic light.
Also, if you are learning graphics programming, it is very educating to visualise how light interacts with matter in a more realistic way and see the impact of changing various material properties (definitely recommended to create your own pathtracer as well!).
Lately I pushed it even further raytracing the more triangle heavy San Miguel which also added alpha testing. BVH creation became a bottleneck, binned BVH made it faster but also made creating high quality trees trickier (and the shader more complex), impacting raytracing cost.
Someone at work asked me today where do I find all those presentations about graphics techniques and made me realise that it might not be so common knowledge to people just starting gfx programming. Thread of links.
Programming with compute shaders (efficiently), balancing workloads with resources and thinking in parallel, gives many opportunities to learn how GPUs really work (well, pretty close at least). A few links to get you started. (1/N)
A common theme in the questions I received so far is that beginners feel intimidated by graphics programming and do not know how to start. They need not be though as graphics programming can be approached in different ways and at many levels of complexity (1/5).
Once you feel comfortable with it and get a feeling of how (pixel) shaders work, you could try an easy access engine like Unity to to see how shader programming works in the context of a full game engine (with lighting and props) (3/5).