Sayak Paul Profile picture
Apr 18 5 tweets 4 min read
What do the Vision Transformers learn? How do they encode anything useful for image recognition? In our latest work, we reimplement a number of works done in this area & investigate various ViT model families (DeiT, DINO, original, etc.).

Done w/ @ariG23498

1/
We also reimplemented different models in #Keras. These were first populated w/ pre-trained parameters & were then evaluated to ensure correctness.

Code, models, a tutorial, interactive demos (w/ @huggingface Spaces), visuals:

github.com/sayakpaul/prob…

2/
We’ve used the following methods for our analysis:

* Attention rollout
* Classic heatmap of the attention weights
* Mean attention distance
* Viz of the positional embeddings & linear projections

We hope our work turns out to be a useful resource for those studying ViTs.

3/
We’ve also built a @huggingface organization around our experiments. The organization holds the Keras pre-trained models & spaces where you can try the visualization on your own images.

huggingface.co/probing-vits

Contributions are welcomed :)

4/
We thank @fchollet for his helpful guidance on the tutorial. We thank @jarvislabsai & @GoogleDevExpert for providing us with credit support that allowed the experiments.

Thanks to @ritwik_raha for helping us with this amazing visual.

5/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sayak Paul

Sayak Paul Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RisingSayak

Jan 31
Wanna try out the ConvNeXt models in @TensorFlow / #keras, inspect them, fine-tune them, etc.? Well, here they are: github.com/sayakpaul/Conv…

Includes a total of 15 I-1k and I-21k ConvNeXt models + conversion scripts, off-the-shelf inference, and fine-tuning code.

1/
These models ARE NOT opaque. You can load them like so and inspect whatever you need to:

2/ Image
Here's the full disclosure on the accuracy scores. Differences are mainly for library implementation differences. But happy to stand corrected if someone has other suggestions.

3/ Image
Read 5 tweets
Jan 18
Q: If I implement a paper, there are likely lots of implementations of that already existing. How do I make it worthwhile?

⬇️
1> I think the learning aspect should precede any other aspect in this regard. Whether or not it's gonna be worthwhile shouldn't matter if you are up for the learning challenge.
2> But if you want to make the implementation a part of your project portfolio, the following things could be helpful.

2.1> You could pick up papers that are a bit off the grid from the conventional ones while still being in your territory. You should also enjoy working on it.
Read 5 tweets
Dec 23, 2021
Implementing a paper is helpful in so many ways. Get to

* Know the work inside out including the implementation details.
* Study amazing resources to further your understanding.
* Read a lot of code for references. Sometimes, the official codebases are amazing.

1/
Oftentimes, an idea seems fairly simple but when it comes to implementation details, things start to get messier. This is the learning, folks!

If the original impl. is messy, you might be able to make it elegant, simpler, and in turn, better.

2/
For me, implementing existing works has helped me become a better practitioner and also a better believer. It's almost always never easy but that's the real fun. It boosts your confidence and also your knowledge.

3/
Read 5 tweets
Nov 1, 2021
@soumikRakshit96 and I have been working on this project for a while now. Today, we are delighted to share our progress.

Point cloud segmentation in the wild with @TensorFlow:

github.com/soumik12345/po…

1/
Our repository comes with full TPU support. You can also use multiple GPUs with mixed-precision (when supported). Here's a blog post to get started:

keras.io/examples/visio…

Thanks to @fchollet for your reviews.

2/
We provide standalone scripts and also notebooks for training and testing our models. We open-source all the experimental results and pre-trained models:

github.com/soumik12345/po…

3/ Image
Read 4 tweets
Jun 1, 2021
Recipes that I find to be beneficial when working in low-data/imbalance regimes (vision):

* Use a weighted loss function &/or focal loss.
* Either use simpler/shallower models or use models that are known to work well in these cases. Ex: SimCLRV2, Big Transfer, DINO, etc.

1/n
* Use MixUp or CutMix in the augmentation pipeline to relax the space of marginals.
* Ensure a certain percent of minority class data is always present during each mini-batch. In @TensorFlow, this can be done using `rejection_resampling`.

tensorflow.org/guide/data#rej…

2/n
* Use semi-supervised learning recipes that combine the benefits of self-supervision and few-shot learning. Ex: PAWS by @facebookai.
* Use of SWA is generally advised for better generalization but its use in these regimes is particularly useful.

3/n
Read 4 tweets
Apr 26, 2021
New #Keras example is up on *consistency regularization*or an important recipe for semi-supervised learning and tackling distribution shifts as shown in *Noisy Student Training*.

keras.io/examples/visio…

1/n
This example provides a template for performing semi-supervised / weakly supervised learning. A few things one can plug right in:

* Incorporate more data while training the student.
* Filter the high-confidence predictions while training the student.

2/n
The example uses Stochastic Weight Averaging during training the teacher to induce geometric ensembling. With elements like Stochastic Dropout, the performance might even be better.

Here are the full experiments: git.io/JO55v.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(