Namuk Park Profile picture
Deep Learning Research Scientist
Feb 15, 2022 7 tweets 4 min read
Our paper “How Do Vision Transformers Work?” was accepted as a Spotlight at #ICLR2022!!

We show that the success of ViTs is NOT due to their weak inductive bias & capturing long-range dependency.

paper: openreview.net/forum?id=D78Go…
code & summary: github.com/xxxnell/how-do…

👇 (1/7) We address the following three key questions of multi-head self-attentions (MSAs) and ViTs:

Q1. What properties of MSAs do we need to better optimize NNs? 
Q2. Do MSAs act like Convs? If not, how are they different?
Q3. How can we harmonize MSAs with Convs?

(2/7)