Namuk Park Profile picture
Deep Learning Research Scientist

Feb 15, 2022, 7 tweets

Our paper “How Do Vision Transformers Work?” was accepted as a Spotlight at #ICLR2022!!

We show that the success of ViTs is NOT due to their weak inductive bias & capturing long-range dependency.

paper: openreview.net/forum?id=D78Go…
code & summary: github.com/xxxnell/how-do…

👇 (1/7)

We address the following three key questions of multi-head self-attentions (MSAs) and ViTs:

Q1. What properties of MSAs do we need to better optimize NNs? 
Q2. Do MSAs act like Convs? If not, how are they different?
Q3. How can we harmonize MSAs with Convs?

(2/7)

Q1. What Properties of MSAs Do We Need?

MSAs have their pros and cons. MSAs improve NNs by flattening the loss landscapes. A key feature is their data specificity, not long-range dependency. On the other hand, ViTs suffers from non-convex losses.

(3/7)

Q2. Do MSAs Act Like Convs?

MSAs and Convs exhibit opposite behaviors. Therefore, MSAs and Convs are complementary. For example, MSAs are low-pass filters, but Convs are high-pass filters. It suggests that MSAs are shape-biased, whereas Convs are texture-biased.

(4/7)

Q3. How Can We Harmonize MSAs With Convs?

MSAs at the end of a stage (not a model) play a key role. We thus introduce AlterNet by replacing Convs at the end of a stage with MSAs. AlterNet outperforms CNNs not only in large data regimes but also in small data regimes.

(5/7)

Then, how to apply MSA to your own CNN model?

1. Alternately replace Conv blocks with MSA blocks from the end of a baseline CNN.
2. If the added MSA block does not improve predictive performance, replace a Conv block located at the end of an earlier stage with an MSA.

(6/7)

In summary, MSA ≠ Conv with weak inductive bias.
The self-attention formulation is ANOTHER inductive bias that complements Convs.

slide: bit.ly/3gNkV7e

(7/7)

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling