Tweet

Jia-Bin Huang

6 Jul 20, 14 tweets, 5 min read

Sharing one idea I found useful for paper writing:

Do NOT ask people to solve correspondence problems.

Some Dos and Don'ts examples below:

*Figures*: Don't ask people to match (a), (b), (c) ... with the descriptions in the figure caption.

*Figure caption*

Use "self-contained" caption. It's annoying to dig into the texts and match them to the figures. Ain't nobody got time for that! ⌚️

Also, add a figure "caption title" (in bold fonts). It allows readers to navigate through figures quickly.

*Notations*

Give specific, meaningful names to your math notations. For example, the readers won't need to go back and forth to figure what each term means.

*Which*

I found that many of my students love to use "which" in their sentences. I hate it ... because I often cannot figure out what exactly "which" refers to. Break it down into simple sentences and spell out what that subject of the sentence is.

*Respectively*

It's hard to parse which corresponds to which in the sentence that ends with "respectively" (have to solve a long-range correspondence problem). Break them them so that one sentence talks about one thing.

*Citations*

People like to use many acronyms for their methods. It may be hard for readers to memorize/match which method/dataset/metric you are referring to. Adding citations is an easy way to fix this.

*Names for notations*

When using notations in the sentences, mention their "names" as well. The readers won't need to flip through your papers to look up what these notations mean.

*Connect figures with equations, notations, and sections*

I view the overview figure in a paper a centralized hub that connects all the important equations, notations, and sections in one place. This makes it easy for people to understand how everything fits together.

*Tables*

Factorize the variants/attributes of different methods so that it becomes clear to compare one with another.

*One table, one message*

Decompose your big table so that each table conveys exactly one thing. This avoids people from having to compare results from distant rows. Having multiple smaller tables gets the point across easier. (Don't worry about the redundancy.)

*Group subfigures*

Don't ask readers to figure out the grouping (b-c) and (d-e) in the caption when you explicitly group them.

How to create underbracket? Ex:

$\underbracket[1pt][2.0mm]{\hspace{\FIGWIDTH}}_%
{\substack{\vspace{-2.0mm}\\
\colorbox{white}{(a) Input}}}$

*Parallelism*

When applicable, use repetitive grammatical elements in your sentence. It helps the readers to easily parse the parallel concepts you want to convey.

*Table organization*

Merge tables sharing the same structure. Label the metric (the larger/smaller the better) with up-arrow and down-arrow so that your readers don't need to look them up.

*Shape attributes*

Leverage the shape attributes (color, thickness) to encode the message.

Also, use a deemphasized image in the background to avoid mental matching.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @jbhuang0604

Jia-Bin Huang

@jbhuang0604

21 Jan

Get into your slides!

I recently found an easy setup to get into my slides. Compared to the standard zoom setup, it's fun, engaging, and allows me to interact with the slide contents directly.

Check out the thread below and set it up for your own presentation!

@cem_yuksel

I mainly follow the excellent video tutorial by @cem_yuksel
but with a poor man's variant (i.e., without a white background or green screen).

Make sure to check out the videos for the best quality!

Step 1: Download Open Broadcaster Software (OBS) studio. obsproject.com

Why: We will use OBS to composite the slides and your camera video feed together and create a "virtual camera".

You can then use the virtual camera for your video conferencing presentation.

Read 8 tweets

Jia-Bin Huang

@jbhuang0604

14 Jan

Neural Volume Rendering for Dynamic Scenes

NeRF has shown incredible view synthesis results, but it requires multi-view captures for STATIC scenes.

How can we achieve view synthesis for DYNAMIC scenes from a single video? Here is what I learned from several recent efforts.

Instead of presenting Video-NeRF, Nerfie, NR-NeRF, D-NeRF, NeRFlow, NSFF (and many others!) as individual algorithms, here I try to view them from a unifying perspective and understand the pros/cons of various design choices.

Okay, here we go.

*Background*

NeRF represents the scene as a 5D continuous volumetric scene function that maps the spatial position and viewing direction to color and density. It then projects the colors/densities to form an image with volume rendering.

Volumetric + Implicit -> Awesome!

Read 16 tweets

Jia-Bin Huang

@jbhuang0604

13 Jan

@ylzou_Zack

Semi-supervised learning with consistency regularization and pseudo-labeling works great for CLASSIFICATION.

But how about STRUCTURED PREDICTION tasks? 🤔

Check out @ylzou_Zack's #ICLR2021 paper on designing pseudo-labels for semantic segmentation.
yuliang.vision/pseudo_seg/

How do we get pseudo labels from unlabeled images?

Unlike classification, directly thresholding the network outputs for dense prediction doesn't work well.

Our idea: start with weakly sup. localization (Grad-CAM) and refine it with self-attention for propagating the scores.

Using two different prediction mechanisms is great bc they make errors in different ways. With our fusion strategy, we get WELL-CALIBRATED pseudo labels (see the expected calibration errors in E below) and IMPROVED accuracy under 1/4, 1/8, 1/16 of labeled examples.

Read 6 tweets

Jia-Bin Huang

@jbhuang0604

19 Dec 20

Sharing some LaTeX hacks I like (and trying to crowdsource more)!

*Teaser*

Popularized by Randy Pausch's paper in 1996, now most papers start with a teaser. Make sure that you have an awesome one.

\twocolumn[{
\renewcommand\twocolumn[1][]{#1}
\maketitle
\input{teaser}
}]

*Table formatting*

I feel that 10% of my job is to replace \hline with \toprule, \midrule, and \bottomrule. Formatting your table well will help you convey your messages much more clearly.

Check out: people.inf.ethz.ch/markusp/teachi…

*Quickly remove in-line comments*

This hack can quickly help estimate paper length w/o comments, particularly helpful when you close to the submission deadline!

\usepackage{ifthenifthen}
\newcommand{\final}{1}
\ifthenelse{\equal{\final}{1}}
{
\renewcommand{\jiabin}[1]{}
}{}

Read 5 tweets

Jia-Bin Huang

@jbhuang0604

13 Dec 20

Have you ever wondered why papers from top universities/research labs often appear in the top few positions in the daily email and web announcements from arXiv?

Why is that the case? Why should I care?

Wait a minute! Does the article position even matter?

It matters!

See arxiv.org/abs/0907.4740

-> Articles in position 1 received median numbers of citations 83%, 50%, and 100% higher than those lower down in three communities.

So you get a significantly higher visibility boost, wider readership, and long-term citations and impacts by ...

simply putting your paper on the top position in the articles!

Crazy huh?

Read 6 tweets

Jia-Bin Huang

@jbhuang0604

12 Dec 20

@JPKopf

How can we turn causal videos into 3D? Excited to share our work on Robust Consistent Video Depth Estimation.

Project: robust-cvd.github.io
Paper: arxiv.org/abs/2012.05901

w/ @JPKopf @jastarex

Check out the 🧵below!

@XuanLuo14

We start by examining our Consistent Video Depth Estimation (CVD) in SIGGRAPH 2020 (work led by the amazing @XuanLuo14).

roxanneluo.github.io/Consistent-Vid…

The method achieves AWESOME results but requires precise camera poses as inputs.

Isn't SLAM/SfM a SOLVED problem? You might ask.

Yes, it works pretty well for static and controlled environments. For causal videos, existing methods usually fail to register all frames or produce outlier poses with large errors.

As a result, CVD works only *when SFM works*.