Tweet

https://twitter.com/jbhuang0604/status/1279992089611247617

More from @jbhuang0604

Jia-Bin Huang

@jbhuang0604

21 Jan

Get into your slides!

I recently found an easy setup to get into my slides. Compared to the standard zoom setup, it's fun, engaging, and allows me to interact with the slide contents directly.

Check out the thread below and set it up for your own presentation!

@cem_yuksel

I mainly follow the excellent video tutorial by @cem_yuksel
but with a poor man's variant (i.e., without a white background or green screen).

Make sure to check out the videos for the best quality!

Step 1: Download Open Broadcaster Software (OBS) studio. obsproject.com

Why: We will use OBS to composite the slides and your camera video feed together and create a "virtual camera".

You can then use the virtual camera for your video conferencing presentation.

Read 8 tweets

Jia-Bin Huang

@jbhuang0604

14 Jan

Neural Volume Rendering for Dynamic Scenes

NeRF has shown incredible view synthesis results, but it requires multi-view captures for STATIC scenes.

How can we achieve view synthesis for DYNAMIC scenes from a single video? Here is what I learned from several recent efforts.

Instead of presenting Video-NeRF, Nerfie, NR-NeRF, D-NeRF, NeRFlow, NSFF (and many others!) as individual algorithms, here I try to view them from a unifying perspective and understand the pros/cons of various design choices.

Okay, here we go.

*Background*

NeRF represents the scene as a 5D continuous volumetric scene function that maps the spatial position and viewing direction to color and density. It then projects the colors/densities to form an image with volume rendering.

Volumetric + Implicit -> Awesome!

Read 16 tweets

Jia-Bin Huang

@jbhuang0604

13 Jan

@ylzou_Zack

Semi-supervised learning with consistency regularization and pseudo-labeling works great for CLASSIFICATION.

But how about STRUCTURED PREDICTION tasks? 🤔

Check out @ylzou_Zack's #ICLR2021 paper on designing pseudo-labels for semantic segmentation.
yuliang.vision/pseudo_seg/

How do we get pseudo labels from unlabeled images?

Unlike classification, directly thresholding the network outputs for dense prediction doesn't work well.

Our idea: start with weakly sup. localization (Grad-CAM) and refine it with self-attention for propagating the scores.

Using two different prediction mechanisms is great bc they make errors in different ways. With our fusion strategy, we get WELL-CALIBRATED pseudo labels (see the expected calibration errors in E below) and IMPROVED accuracy under 1/4, 1/8, 1/16 of labeled examples.

Read 6 tweets

Jia-Bin Huang

@jbhuang0604

13 Dec 20

Have you ever wondered why papers from top universities/research labs often appear in the top few positions in the daily email and web announcements from arXiv?

Why is that the case? Why should I care?

Wait a minute! Does the article position even matter?

It matters!

See arxiv.org/abs/0907.4740

-> Articles in position 1 received median numbers of citations 83%, 50%, and 100% higher than those lower down in three communities.

So you get a significantly higher visibility boost, wider readership, and long-term citations and impacts by ...

simply putting your paper on the top position in the articles!

Crazy huh?

Read 6 tweets

Jia-Bin Huang

@jbhuang0604

12 Dec 20

@JPKopf

How can we turn causal videos into 3D? Excited to share our work on Robust Consistent Video Depth Estimation.

Project: robust-cvd.github.io
Paper: arxiv.org/abs/2012.05901

w/ @JPKopf @jastarex

Check out the 🧵below!

@XuanLuo14

We start by examining our Consistent Video Depth Estimation (CVD) in SIGGRAPH 2020 (work led by the amazing @XuanLuo14).

roxanneluo.github.io/Consistent-Vid…

The method achieves AWESOME results but requires precise camera poses as inputs.

Isn't SLAM/SfM a SOLVED problem? You might ask.

Yes, it works pretty well for static and controlled environments. For causal videos, existing methods usually fail to register all frames or produce outlier poses with large errors.

As a result, CVD works only *when SFM works*.

Read 9 tweets

Jia-Bin Huang

@jbhuang0604

11 Dec 20

@gaochen315

How can we learn NeRF from a SINGLE portrait image? Check out @gaochen315's recent work leverages new (meta-learning) and old (3D morphable model) tricks to make it work! This allows us to synthesize new views and manipulate FOV.

Project: portrait-nerf.github.io

@gaochen315

Work led by the amazing Chen Gao (@gaochen315) and in collaboration with friends from Google (Yichang Shih, Wei-Sheng Lai, and Chia-Kai Liang).

Paper: arxiv.org/abs/2012.05903

So, how does it work?

Training a NeRF from a single image from scratch won't work because it cannot recover the correct shape. The rendering results look fine at the original viewpoint but produce large errors at novel views.