Most recents (1)

Stefan Werner

@stefan_3d

Back of the envelope calculation:
RTX 2080Ti: 10GRay/s @ 616GB/s mem bandwidth = 61 bytes/Ray
1 triangle, 3x 32 bit float3 vertices: 48 bytes
61 - 48 = 13 bytes left for BVH traversal

That would be under an ideal BVH that requires only 1 ray triangle intersection/ray

Compressed wide BVH (research.nvidia.com/sites/default/…) requires 80 Bytes per BVH node. A balanced BVH8 over 1 million triangles is 7 level deep, so we're looking at 80 bytes * 7 = 560 bytes of processed data per ray. Times ten gigarays/s = 5.6 TB/s of bandwidth just for BVH traversal.

#Volta #V100 has 12-14TB/s shared memory bandwidth (arxiv.org/pdf/1804.06826…), so 10GRays/s are plausible if most of the data fits in L1 cache/shard mem.
V100 has 80 SMs with 128KB L1/shared mem each, a total of 10MB. 10MB aren't enough to fit a 7 levels deep BVH8.

Read 10 tweets

Discover and read the best of Twitter Threads about #V100

Most recents (1)

Related hashtags

Discover and read the best of Twitter Threads about #V100

Most recents (1)

Related hashtags

Did Thread Reader help you today?