*MLP-Mixer: An all-MLP Architecture for Vision*

It's all over Twitter!

A new, cool architecture that mixes several ideas from MLPs, CNNs, ViTs, trying to keep it as simple as possible.

Small thread below. 👇 /n
The idea is strikingly simple:

(i) transform an image into a sequence of patches;
(ii) apply in alternating fashion an MLP on each patch, and on each feature wrt all patches.

Mathematically, it is equivalent to applying an MLP on rows and columns of the matrix of patches. /n
There has been some discussion (and memes!) sparked from this tweet by @ylecun, because several components can be interpreted (or implemented) using convolutive layers (eg, 1x1 convolutions).

So, not a CNN, but definitely not a "simple MLP" either. /n

The results, as you would guess from the Twitter virality, are good, especially when you go to extremely large sizes.

In what appears to have become a standard, many results are from the internal JFT-300M dataset, which is definitely not good for reproducibility. /n
I have seen many discussions about the "smaller inductive bias" of this architecture.

If we go by code and simplicity, this might be true, but I honestly find it extremely hard to understand "how much" architectural bias or properties we have here. /n
Anyway, paper is here: arxiv.org/pdf/2105.01601…

Already quite a lot of implementations: paperswithcode.com/paper/mlp-mixe…

Video by @ykilcher :

/n
Kudos to all authors! Judging by the amount of interest and discussion, this is already set to become a strong baseline (or more?) along which to compare.

@neilhoulsby @tolstikhini @__kolesnikov__ @giffmana @XiaohuaZhai @TomUnterthiner @JessicaYung17 @keysers @kyosu @MarioLucic_

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Simone Scardapane

Simone Scardapane Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(