Discover and read the best of Twitter Threads about #V100

Most recents (1)

Back of the envelope calculation:
RTX 2080Ti: 10GRay/s @ 616GB/s mem bandwidth = 61 bytes/Ray
1 triangle, 3x 32 bit float3 vertices: 48 bytes
61 - 48 = 13 bytes left for BVH traversal

That would be under an ideal BVH that requires only 1 ray triangle intersection/ray
Compressed wide BVH (research.nvidia.com/sites/default/…) requires 80 Bytes per BVH node. A balanced BVH8 over 1 million triangles is 7 level deep, so we're looking at 80 bytes * 7 = 560 bytes of processed data per ray. Times ten gigarays/s = 5.6 TB/s of bandwidth just for BVH traversal.
#Volta #V100 has 12-14TB/s shared memory bandwidth (arxiv.org/pdf/1804.06826…), so 10GRays/s are plausible if most of the data fits in L1 cache/shard mem.
V100 has 80 SMs with 128KB L1/shared mem each, a total of 10MB. 10MB aren't enough to fit a 7 levels deep BVH8.
Read 10 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!