In anticipation of the Intl. Conf. on Computer Vision (#ICCV2021) this week, I rounded up all papers that use Neural Radiance Fields (NeRFs) represented in the main #ICCV2021 conference here (1/N):
dellaert.github.io/NeRF21
Many of the papers I discussed in my original blog-post on NerF (dellaert.github.io/NeRF/) made it into CVPR, but the sheer number of NeRF-style papers that appeared on Arxiv this year meant I could no longer keep up. 2/N
Conferences like #ICCV2021 (with CVPR, the top-tier Computer Vision conference) provide an (imperfect) filter, and I decided to read all the papers I could find in the ICCV main program. I share them with you below and in the archival blog post. Email me if any are missing! 3/N
NeRF, of course, was introduced in the (recent but already seminal) Neural Radiance Fields paper by Mildenhall et al. at ECCV 2020. A NeRF stores a volumetric scene representation as the weights of an MLP, trained on many images with known pose. 4/N
At #ICCV2021, several papers address the fundamentals of view-synthesis with NeRF-like methods in the original, fully-posed multi-view setup: 5/N
Mip-NeRF (jonbarron.info/mipnerf/) address the severe aliasing artifacts from vanilla NeRF by adapting the mip-map idea from graphics and replacing sampling the light field by integrating over conical sections along a the viewing rays. 6/N #ICCV2021
MVSNeRF (apchenstu.github.io/mvsnerf/) trains a model across many scenes and then renders new views conditioned on only a few posed input views, using intermediate voxelized features that encode the volume to be rendered. 7/N #ICCV2021
DietNeRF (arxiv.org/abs/2104.00677) is a very out-of-the box method that supervises the NeRF training process by a semantic loss, created by evaluating arbitrary views using CLIP, so it can learn a NeRF from a single view for arbitrary categories. 8/N #ICCV2021
UNISURF (arxiv.org/abs/2104.10078) propose to replace the density in NeRF with occupancy, and hierarchical sampling with root-finding, allowing to do both volume and surface rendering for much improved geometry. 9/N #ICCV2021
NerfingMVS weiyithu.github.io/NerfingMVS use a sparse depth map from an SfM pipeline to train a scene-specific depth network that subsequently guides the adaptive sampling strategy in NeRF. 10/N #ICCV2021
The slow rendering/training of NeRF prompted many more papers on speeding up NeRf, mostly focused on rendering: 11/N
FastNeRF (arxiv.org/abs/2103.10380) factorizes the NeRF volume rendering equation into two branches that are combined to give the same results as NeRF, but allow for much more efficient caching, yielding a 3000x speed up. 12/N #ICCV2021
KiloNeRF (github.com/creiser/kilone…) replaces a single large NeRF-MLP with thousands of tiny MLPs, accelerating rendering by 3 orders of magnitude. 13/N #ICCV2021
PlenOctrees (alexyu.net/plenoctrees/) introduce NeRF-SH that uses spherical harmonics to model view-dependent color, and then compresses that into a octree-like data-structure for rendering the result 3000 faster than NeRF. 14/N #ICCV2021
SNeRG (arxiv.org/abs/2103.14645) precompute and "bake" a NeRF into a new Sparse Neural Radiance Grid (SNeRG) representation, enabling real-time rendering. 15/N #ICCV2021
RtS (arxiv.org/abs/2108.04886) focuses on rendering derivatives efficiently and correctly for a variety of surface representations, including NeRF, using a fast "Surface NeRF" or sNerF renderer. 16/N #ICCV2021
Another trend is to remove the need for (exact) pose supervision, which started with 'NeRF--' (on Arxiv), and is done by no less than three papers at ICCV: 17/N
GNeRF (arxiv.org/abs/2103.15606) distinguishes itself from other pose-free NeRF efforts by virtue of a "rough initial pose" network, which uses GAN-style training a la GRAF, which solves the (hard) initialization problem. 18/N #ICCV2021
BARF (chenhsuanlin.bitbucket.io/bundle-adjusti…) optimizes for the scene and the camera poses simultaneously, as in "bundle adjustment", in a coarse-to-fine manner. 19/N #ICCV2021
SCNeRF (postech-cvlab.github.io/SCNeRF/) is similar to BARF, but additionally optimizes over intrinsics, including radial distortion and per-pixel non-linear distortion. 20/N #ICCV2021
One of the largest areas of activity, at least in terms of number of papers, is conditioning NeRF-like models on various latent codes: 21/N
GRF (github.com/alextrevithick…) is, like PixelNeRF and IBRNet at CVPR, closer to image-based rendering, where only a few images are used at test time. Unlike PixelNeRF GRF operates in a canonical space rather than in view space. 22/N #ICCV2021
GSN (apple.github.io/ml-gsn/) is a generative model for *scenes*: it takes a global code that is translated into a grid of local codes, each associated with a local radiance model. A small convnet helps upscaling the final output. 23/N #ICCV2021
GANcraft (nvlabs.github.io/GANcraft) translates a semantic block world into a set of voxel-bound NeRF-models that allows rendering of photorealistic images corresponding to this “Minecraft” world, additionally conditioned a style latent code. 24/N #ICCV2021
CodeNeRF (sites.google.com/view/wbjang/ho…) Trains a GRAF-style conditional NeRF (a shape and appearance latent code) and then optimizes at inference time over both latent codes *and* the object pose. 25/N #ICCV2021
Conditional NeRFs are the bread and butter of efforts that do various cool things with composing scenes: 26/N
EditNeRF (editnerf.csail.mit.edu) learns a category-specific conditional NeRF model, inspired by GRAF but with an instance-agnostic branch, and show a variety of strategies to edit both color and shape interactively. 27/N #ICCV2021
ObjectNeRF (zju3dv.github.io/object_nerf/) trains a voxel embedding feeding two pathways: scene and objects. By modifying the voxel embedding the objects can be moved, cloned, or removed. 28/N #ICCV2021
At least four efforts focus on dynamic scenes, using a variety of schemes, including some that I already discussed earlier: 29/N
Nerfies (nerfies.github.io) and its underlying D-NeRF model deformable videos using a second MLP applying a deformation for each frame of the video. 30/N #ICCV2021
NeRFlow (yilundu.github.io/nerflow/) is a concurrent effort, which learns "a single consistent continuous spatial-temporal radiance field that is constrained to generate consistent 4D view synthesis across both space and time". 31/N #ICCV2021
NR-NeRF (vcai.mpi-inf.mpg.de/projects/nonri…) also uses a deformation MLP to model non-rigid scenes. It has no reliance on pre-computed scene 32/N #ICCV2021
AD-NeRF (yudongguo.github.io/ADNeRF/) train a conditional nerf from a short video with audio, concatenating DeepSpeech features and head pose to the input, enabling new audio-driven synthesis as well as editing of the input clip. 33/N #ICCV2021
Finally, a cool trend is skeleton-driven NeRFs, that promise to be useful for animating avatars and the like: 34/N
NARF (github.com/nogu-atsu/NARF) use pose supervision to train a small *local* occupancy network per articulated part, which is then used to modulate a conditionally trained NeRF model. 35/N #ICCV2021
AnimatableNeRF (zju3dv.github.io/animatable_ner…) use a tracked skeleton from mocap data and multi-view video to train skeleton-based blend-fields that then transform the radiance field, enabling skeleton-driven synthesis of people's avatars. 36/N #ICCV2021
That's all folks :-) Again, if you see any that are missing (check the blog post: dellaert.github.io/NeRF21) let me know in DM or via email. N/N #ICCV2021 #NeRF
Turns out I had left out a category :-) Here are four other very cool papers using NeRF-technology that defy easy categorization: (N+1)/N
IMAP (edgarsucar.github.io/iMAP) is an awesome paper that uses NeRF as the scene representation in an online visual SLAM system, learning a 3D scene online and tracking a moving camera against it. (N+2)/N #ICCV2021
MINE (vincentfung13.github.io/projects/mine/) learns to predict a density/color multi-plane representation, conditioned on a single image, which can then be used for NeRF-style volume rendering. (N+3)/N #ICCV2021
Semantic-NERF (shuaifengzhi.com/Semantic-NeRF/) add a segmentation renderer before injecting viewing directions into NeRF and generate high resolution semantic labels for a scene with only partial, noisy or low-resolution semantic supervision. (N+4)/N #ICCV2021
CO3D (github.com/facebookresear…) contributes an *amazing* dataset of annotated object videos, and evaluates 15 methods on single-scene reconstruction and learning 3D object categories, including a new SOTA “NerFormer” model. (N+5)/N #ICCV2021
CryoDRGN2 (openaccess.thecvf.com/content/ICCV20…) attacks the challenging problem of reconstructing protein structure *and* pose from a "multiview" set of cryo-EM *density* images. It is unique among NeRF-style papers as it works in the Fourier domain. (N+6) #ICCV2021
And a paper I missed - thanks @jbhuang0604 : DynamicVS (free-view-video.github.io) is attacking the very challenging free-viewpoint video synthesis problem, and uses scene-flow prediction along with *many* regularization results to produce impressive results. (N+7) #ICCV2021
Finally (?) I added a session agenda for #NeRF at #ICCV2021 in the blog post dellaert.github.io/NeRF21, screenshot below (click to see whole image): (N+8)
Another paper I missed (in session 10): NeRD (markboss.me/publication/20…) or “Neural Reflectance Decomposition” uses physically-based rendering to decompose the scene into spatially varying BRDF material properties, enabling re-lighting of the scene. (N+9) #ICCV2021

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Frank Dellaert

Frank Dellaert Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @fdellaert

16 Dec 20
2020 was the year in which *neural volume rendering* exploded onto the scene, triggered by the impressive NeRF paper by Mildenhall et al. I wrote a post as a way of getting up to speed in a fascinating and very young field and share my journey with you: dellaert.github.io/NeRF/
The precursors to NeRF are approaches that use an *implicit* surface representation. At CVPR 2019, 3 papers introduced the use of neural nets as *scalar function approximators* to define occupancy and/or signed distance functions.
Occupancy networks (avg.is.tuebingen.mpg.de/publications/o…) introduce implicit, coordinate-based learning of occupancy. A network consisting of 5 ResNet blocks take a feature vector and a 3D point and predict binary occupancy.
Read 43 tweets
14 Dec 20
How about giving all conference submissions one extra page but insisting we use verbose citations, like (Mildenhall et al. 2020).
I find myself constantly flipping to the references page.
And citations are not consistent across papers.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(