In anticipation of the Intl. Conf. on Computer Vision (#ICCV2021) this week, I rounded up all papers that use Neural Radiance Fields (NeRFs) represented in the main #ICCV2021 conference here (1/N): dellaert.github.io/NeRF21
Many of the papers I discussed in my original blog-post on NerF (dellaert.github.io/NeRF/) made it into CVPR, but the sheer number of NeRF-style papers that appeared on Arxiv this year meant I could no longer keep up. 2/N
Conferences like #ICCV2021 (with CVPR, the top-tier Computer Vision conference) provide an (imperfect) filter, and I decided to read all the papers I could find in the ICCV main program. I share them with you below and in the archival blog post. Email me if any are missing! 3/N
NeRF, of course, was introduced in the (recent but already seminal) Neural Radiance Fields paper by Mildenhall et al. at ECCV 2020. A NeRF stores a volumetric scene representation as the weights of an MLP, trained on many images with known pose. 4/N
At #ICCV2021, several papers address the fundamentals of view-synthesis with NeRF-like methods in the original, fully-posed multi-view setup: 5/N
Mip-NeRF (jonbarron.info/mipnerf/) address the severe aliasing artifacts from vanilla NeRF by adapting the mip-map idea from graphics and replacing sampling the light field by integrating over conical sections along a the viewing rays. 6/N #ICCV2021
MVSNeRF (apchenstu.github.io/mvsnerf/) trains a model across many scenes and then renders new views conditioned on only a few posed input views, using intermediate voxelized features that encode the volume to be rendered. 7/N #ICCV2021
DietNeRF (arxiv.org/abs/2104.00677) is a very out-of-the box method that supervises the NeRF training process by a semantic loss, created by evaluating arbitrary views using CLIP, so it can learn a NeRF from a single view for arbitrary categories. 8/N #ICCV2021
UNISURF (arxiv.org/abs/2104.10078) propose to replace the density in NeRF with occupancy, and hierarchical sampling with root-finding, allowing to do both volume and surface rendering for much improved geometry. 9/N #ICCV2021
NerfingMVS weiyithu.github.io/NerfingMVS use a sparse depth map from an SfM pipeline to train a scene-specific depth network that subsequently guides the adaptive sampling strategy in NeRF. 10/N #ICCV2021
The slow rendering/training of NeRF prompted many more papers on speeding up NeRf, mostly focused on rendering: 11/N
FastNeRF (arxiv.org/abs/2103.10380) factorizes the NeRF volume rendering equation into two branches that are combined to give the same results as NeRF, but allow for much more efficient caching, yielding a 3000x speed up. 12/N #ICCV2021
KiloNeRF (github.com/creiser/kilone…) replaces a single large NeRF-MLP with thousands of tiny MLPs, accelerating rendering by 3 orders of magnitude. 13/N #ICCV2021
PlenOctrees (alexyu.net/plenoctrees/) introduce NeRF-SH that uses spherical harmonics to model view-dependent color, and then compresses that into a octree-like data-structure for rendering the result 3000 faster than NeRF. 14/N #ICCV2021
SNeRG (arxiv.org/abs/2103.14645) precompute and "bake" a NeRF into a new Sparse Neural Radiance Grid (SNeRG) representation, enabling real-time rendering. 15/N #ICCV2021
RtS (arxiv.org/abs/2108.04886) focuses on rendering derivatives efficiently and correctly for a variety of surface representations, including NeRF, using a fast "Surface NeRF" or sNerF renderer. 16/N #ICCV2021
Another trend is to remove the need for (exact) pose supervision, which started with 'NeRF--' (on Arxiv), and is done by no less than three papers at ICCV: 17/N
GNeRF (arxiv.org/abs/2103.15606) distinguishes itself from other pose-free NeRF efforts by virtue of a "rough initial pose" network, which uses GAN-style training a la GRAF, which solves the (hard) initialization problem. 18/N #ICCV2021
SCNeRF (postech-cvlab.github.io/SCNeRF/) is similar to BARF, but additionally optimizes over intrinsics, including radial distortion and per-pixel non-linear distortion. 20/N #ICCV2021
One of the largest areas of activity, at least in terms of number of papers, is conditioning NeRF-like models on various latent codes: 21/N
GRF (github.com/alextrevithick…) is, like PixelNeRF and IBRNet at CVPR, closer to image-based rendering, where only a few images are used at test time. Unlike PixelNeRF GRF operates in a canonical space rather than in view space. 22/N #ICCV2021
GSN (apple.github.io/ml-gsn/) is a generative model for *scenes*: it takes a global code that is translated into a grid of local codes, each associated with a local radiance model. A small convnet helps upscaling the final output. 23/N #ICCV2021
GANcraft (nvlabs.github.io/GANcraft) translates a semantic block world into a set of voxel-bound NeRF-models that allows rendering of photorealistic images corresponding to this “Minecraft” world, additionally conditioned a style latent code. 24/N #ICCV2021
CodeNeRF (sites.google.com/view/wbjang/ho…) Trains a GRAF-style conditional NeRF (a shape and appearance latent code) and then optimizes at inference time over both latent codes *and* the object pose. 25/N #ICCV2021
Conditional NeRFs are the bread and butter of efforts that do various cool things with composing scenes: 26/N
EditNeRF (editnerf.csail.mit.edu) learns a category-specific conditional NeRF model, inspired by GRAF but with an instance-agnostic branch, and show a variety of strategies to edit both color and shape interactively. 27/N #ICCV2021
ObjectNeRF (zju3dv.github.io/object_nerf/) trains a voxel embedding feeding two pathways: scene and objects. By modifying the voxel embedding the objects can be moved, cloned, or removed. 28/N #ICCV2021
At least four efforts focus on dynamic scenes, using a variety of schemes, including some that I already discussed earlier: 29/N
Nerfies (nerfies.github.io) and its underlying D-NeRF model deformable videos using a second MLP applying a deformation for each frame of the video. 30/N #ICCV2021
NeRFlow (yilundu.github.io/nerflow/) is a concurrent effort, which learns "a single consistent continuous spatial-temporal radiance field that is constrained to generate consistent 4D view synthesis across both space and time". 31/N #ICCV2021
AD-NeRF (yudongguo.github.io/ADNeRF/) train a conditional nerf from a short video with audio, concatenating DeepSpeech features and head pose to the input, enabling new audio-driven synthesis as well as editing of the input clip. 33/N #ICCV2021
Finally, a cool trend is skeleton-driven NeRFs, that promise to be useful for animating avatars and the like: 34/N
NARF (github.com/nogu-atsu/NARF) use pose supervision to train a small *local* occupancy network per articulated part, which is then used to modulate a conditionally trained NeRF model. 35/N #ICCV2021
AnimatableNeRF (zju3dv.github.io/animatable_ner…) use a tracked skeleton from mocap data and multi-view video to train skeleton-based blend-fields that then transform the radiance field, enabling skeleton-driven synthesis of people's avatars. 36/N #ICCV2021
That's all folks :-) Again, if you see any that are missing (check the blog post: dellaert.github.io/NeRF21) let me know in DM or via email. N/N #ICCV2021#NeRF
Turns out I had left out a category :-) Here are four other very cool papers using NeRF-technology that defy easy categorization: (N+1)/N
IMAP (edgarsucar.github.io/iMAP) is an awesome paper that uses NeRF as the scene representation in an online visual SLAM system, learning a 3D scene online and tracking a moving camera against it. (N+2)/N #ICCV2021
MINE (vincentfung13.github.io/projects/mine/) learns to predict a density/color multi-plane representation, conditioned on a single image, which can then be used for NeRF-style volume rendering. (N+3)/N #ICCV2021
Semantic-NERF (shuaifengzhi.com/Semantic-NeRF/) add a segmentation renderer before injecting viewing directions into NeRF and generate high resolution semantic labels for a scene with only partial, noisy or low-resolution semantic supervision. (N+4)/N #ICCV2021
CO3D (github.com/facebookresear…) contributes an *amazing* dataset of annotated object videos, and evaluates 15 methods on single-scene reconstruction and learning 3D object categories, including a new SOTA “NerFormer” model. (N+5)/N #ICCV2021
CryoDRGN2 (openaccess.thecvf.com/content/ICCV20…) attacks the challenging problem of reconstructing protein structure *and* pose from a "multiview" set of cryo-EM *density* images. It is unique among NeRF-style papers as it works in the Fourier domain. (N+6) #ICCV2021
And a paper I missed - thanks @jbhuang0604 : DynamicVS (free-view-video.github.io) is attacking the very challenging free-viewpoint video synthesis problem, and uses scene-flow prediction along with *many* regularization results to produce impressive results. (N+7) #ICCV2021
Another paper I missed (in session 10): NeRD (markboss.me/publication/20…) or “Neural Reflectance Decomposition” uses physically-based rendering to decompose the scene into spatially varying BRDF material properties, enabling re-lighting of the scene. (N+9) #ICCV2021
• • •
Missing some Tweet in this thread? You can try to
force a refresh
2020 was the year in which *neural volume rendering* exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. I wrote a post as a way of getting up to speed in a fascinating and very young field and share my journey with you: dellaert.github.io/NeRF/
The precursors to NeRF are approaches that use an *implicit* surface representation. At CVPR 2019, 3 papers introduced the use of neural nets as *scalar function approximators* to define occupancy and/or signed distance functions.
Occupancy networks (avg.is.tuebingen.mpg.de/publications/o…) introduce implicit, coordinate-based learning of occupancy. A network consisting of 5 ResNet blocks take a feature vector and a 3D point and predict binary occupancy.