NeRFs are getting attention these days!
However widespread adoption is still slow. But why?

It comes down to a) file size and b) rendering.
An engineer's viewpoint:

TLDR: NeRFs are fundamentally a bad fit for today's edge device architectures.

Let's explain that in detail:
1/
a) A quality NeRF can easily take up many GB of (dedicated) memory on the GPU. Most consumer devices today are miles away from this.
But even if devices could handle it (they can't) there is still the open question of delivery / transmission.
While you could, presumably
2/
compress things down to a few 100 MB, that would still be > 20x larger than a textured mesh of the same scene.
But even if all of a sudden the whole world got 5G everywhere and loading a few 100 MB per NeRF was tolerable (it's not), such compression would likely be
3/
global in nature, meaning as a user you'd have to wait for all the data to load until decompression is possible and the NeRF could be loaded into GPU memory for viewing. This is in contrast to videos, meshes etc. which have efficient methods for compressed streaming, which
4/
means (in case of video) you can already start watching while more is loaded in the background. With today's tech this simply isn't possible for NeRFs.

There are also lines of NeRF research today that treat the entire scene volume as a Neural Net and are able to achieve
5/
very impressive compression that way (in the order of a few or 10s of MBs per NeRF!).
However the big caveat there is that in order to bring those NeRFs on screen many (slow) NN evaluations are necessary, bringing render times to minutes (!) per frame. So that isn't practical
6/
for consumers either.

But let's assume for now that NeRF delivery was a solved problem (it's not) we still didn't cover the

b) rendering side of things, which is equally challenging.
To understand why, we need to recall what NeRFs are:
neural radiance FIELDS
7/
You can think of a NeRF as an infinitely dense cloud of points that gather around the scene surfaces, and which change their appearance depending on the viewing angle.
@jonstephens85 has a great talk on the topic if you want to learn more:

8/
However, what NeRFs are NOT is a scene surface description. And that's a problem.

To see why we need to realise that, historically, the whole point of a GPU was to
* accelerate the conversion of triangle meshes to pixels on a screen and then to
* paint the pixels in colors
9/
that make the rendering look realistic.

And while triangles as a primitive are extremely flexible, and while GPUs have evolved quite a bit since they were invented decades ago, everything is still centred around the concept of content that, at the lowest level, is
10/
described by surfaces, not view-dependent density fields (a.k.a. NeRFs).

So, in order to render NeRFs at all, what's used today is a technique called ray-marching. Let's explain:

Instead of transforming surface triangles into screen-space, we treat every screen pixel as a
11/
ray that's originating from the camera origin and extends into the NeRF through the given pixel.
To render the NeRF from the given view point we progress along the ray, using a small step size, and accumulate the radiance that the NeRF emits at each point of ray progression.
12/
If we do this for all our pixels we have a rendering of our NeRF from our chosen viewpoint.
The big caveat with this technique is that it is rather slow: Not only do we have a lot of pixels to cover (millions to 10s of millions), but also a lot of steps per pixel/ray to
13/
accumulate the radiances, which in total results in 100s of millions to billions of radiance evaluations per frame.
Only the highest end of GPUs can do this at interactive speeds.
For the average phone/tablet/PC this is entirely out-of-league.
14/
So how do NeRF apps and services today solve this?

* Pre-rendering
* Conversion to Mesh

The Luma Labs viewer is a great example of this:
captures.lumalabs.ai/graceful-effec…
15/
The viewer has three modes:
1) Autoplay
2) Object centered
3) Panoramic

1) Autoplay is simply a pre-rendered video of the NeRF but there's no interactivity.

2) The object-entered mode is fully interactive but is missing most of the things that are great about NeRFs

16/
... and that's because it is converting the NeRF to a mesh again s.t. it can render fast. However that also loses a lot of the visual fidelity that NeRFs offer.

3) The Panoramic mode is basically like the video mode with the difference that you can control the frame

17/
on display. However all the frames are pre-rendered, so interactivity is limited to selecting one of the pre-rendered frames along a fixed trajectory.

And while it's not my intention to downplay the technological achievement here, it does give me some '90s vibes. Why?

18/
Before computers were fast enough to render real-time 3D graphics, some game developers cheated and simply put a video on screen with some embedded interactive elements.
"Star Wars: Rebel Assault" (1993) is a great example here:

19/
The key limitation was of course that there was simply no way to control the camera. So in may ways the interactivity felt superficial.

Today with video being by far the most popular format of NeRF distribution I feel very similar about NeRFs.
20/
Cool tech, but not quite there yet.

Also, if video is the format of distribution, I think it's a fair question to ask: why make the detour through NeRFs in the 1st place and not just capture and share video directly?

That's not to say that everything is impossible though.
21/
If we want interactive NeRFs for a broader audience here are some possible ways forward:
A) Advanced mesh conversion: An example here would be
While not perfect, I know things are just getting started in this line of research, so I expect
22/
quite a bit of progress on this over the next years.

B) Hybrids: Most NeRFs, on most pixels, for most viewpoints, look just like textured meshes, which makes me think that perhaps we could just use a textured mesh as a base and then train a neural net to encode the residual
23/
of the rendering, as opposed to the entire scene. "Difficult surfaces" (glass, shiny, etc) may have to be handled separately though.

C) Outsourced rendering / Cloud streaming: Just render the NeRF on a fast machine in the cloud in real-time and stream the result to the
24/
edge device.

And that's a wrap!

I hope you enjoyed my engineer's deep dive through today's NeRF challenges and developments.

If you are interested in 3D computer vision and graphics topics, please consider giving me a follow.

Much❤️

#NeRF

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rafael Spring

Rafael Spring Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(