The method achieves AWESOME results but requires precise camera poses as inputs.
Isn't SLAM/SfM a SOLVED problem? You might ask.
Yes, it works pretty well for static and controlled environments. For causal videos, existing methods usually fail to register all frames or produce outlier poses with large errors.
As a result, CVD works only *when SFM works*.
How can we make video depth estimation ROBUST?
Our idea: *Joint optimization* of depth and camera poses.
However, optimizing only depth scale or finetuning depth network results in the poor estimation of camera pose trajectory (c-d) due to depth misalignments.
We resolve this problem by replacing the per-frame camera scale with a more flexible *spatially-varying transformation*. The improved alignment of the depth enables computing smoother and more accurate pose trajectories!
The flexible depth deformation is great, but can only achieve low-frequency alignment of depth maps. We further introduce a spatio-temporal geometry-aware depth filter (following flow trajectories) to improve fine depth details.
To validate the robustness, we show that we can estimate the consistent depth and smooth camera poses on all 90 videos on the DAVIS dataset.
No cherry-picking!
Can't wait to see how this will lead to *3D-aware* video recognition/synthesis and beyond!
Also, please check out results on extracting long smooth camera trajectories.
Have you ever wondered why papers from top universities/research labs often appear in the top few positions in the daily email and web announcements from arXiv?
Why is that the case? Why should I care?
Wait a minute! Does the article position even matter?
How can we learn NeRF from a SINGLE portrait image? Check out @gaochen315's recent work leverages new (meta-learning) and old (3D morphable model) tricks to make it work! This allows us to synthesize new views and manipulate FOV.
Training a NeRF from a single image from scratch won't work because it cannot recover the correct shape. The rendering results look fine at the original viewpoint but produce large errors at novel views.
Congratulations Jinwoo Choi for passing his Ph.D. thesis defense!
Special thanks to the thesis committee members (Lynn Abbott, @berty38, Harpreet Dhillon, and Gaurav Sharma) for valuable feedback and advices.
Jinwoo started his PhD in building an interactive system for home-based stroke rehabilitation. Published at ASSETS 17 and PETRA, 2018.
The preliminary efforts lay the foundation for a recent 1.1 million NSF Smart and Connected Health award!
He then looked into scene biases in action recognition datasets and presented debiasing methods that lead to improved generalization in downstream tasks [Choi NeurIPS 19]. chengao.vision/SDN/