Our new paper, C2D (arxiv.org/abs/2103.13646, github.com/ContrastToDivi…) shows how self-supervised pre-training boosts learning with noisy labels, achieves SOTA performance and provides in-depth analysis. Authors @evgeniyzhe @ChaimBaskin Avi Mendelson, Alex Bronstein, @orlitany 1/n
The problem of learning with noisy labels (LNL) has great practical importance: large clean dataset is often expensive or impossible to get. Existing approaches to LNL either modify the loss to account for the noise or try to detect the noisy-labelled samples. 2/n
Yet, we need a starting point. For that, we use "warm-up": regular training on full dataset, which relies on the intrinsic robustness of DNNs to the noise. Main goals of warm-up are providing feature extractor and keeping loss of the noisy samples high, avoiding memorization. 3/n
We believe that inability to achieve those goals is a significant obstacle to improving performance of the learning with noisy labels. While robustness of the networks allows us to achieve good results, we don't know about what are its sources or how to improve it. 4/n
One (very popular) solution is using large scale pre-training, for example, on ImageNet. However, outside of natural images domain, large clean datasets may not exist. Moreover, in some of our experiments ImageNet pre-training have actually harmed the performance. 5/n
Instead, we propose to use self-supervised pre-training. This way, we do not require external data; by ignoring labels, we eliminate influence of noise on the pre-training; by operating on the training set, we avoid domain gap. This can be combined with any LNL method. 6/n
Our proposed framework, which we call Contrast to Divide, or C2D, comprises two stages: self-supervised pre-training (contrast phase), followed by standard algorithm for learning with noisy labels, which can now enjoy better initialization 7/n
Tested with two SOTAs, DivideMix and ELR+, C2D shows huge boosts both in real-life and synthetic cases. For mini-webVision, we improve by more than 3% both on WebVision (79.42%) and ImageNet (78.57%) validation sets. For CIFAR-100 with 90% noise we improve from 34% to 64% 8/n
For Clothing1M dataset the default approach is to utilize ImageNet-pre-trained ResNet-50 as initialization. C2D was able to match state-of-the-art performance (74.58+-0.15% as compared to 74.81%) without any external data. 9/n
For analysis, we started with looking on UMAP features for CIFAR-10. We took the features at the end of warm-up for 20% and 90% noise with and without C2D, and features at the end of self-supervised pre-training. C2D enjoys nuch better separability at high noise settings 10/n
To quantify warm-up goals, we measure loss separability (as ROC-AUC of GMM trained on the loss values) and feature extraction (evaluated with linear classifier) on CIFAR-100. For both, C2D provides significant accuracy boost to ImageNet pre-training and traditional warm-up. 11/n Plot of ROC-AUC vs. linear accuracy after warm-up
We also looked at the distribution of the loss values for clean and noisy samples at the end of warm-up. Interestingly, ImageNet pre-training appears to have even larger overlap than no pre-training, both significantly worse than C2D. 12/n
Finally, we compared C2D with DivideMix to MixMatch. You can think of it as providing an oracle knowing which samples are noisy to DivideMix. Impressively, MixMatch is not better than C2D, and with self-supervised pre-training for MixMatch the difference is only 2%. 13/n
Another interesting phenomenon is that we were unable to achieve good results on CIFAR with ImageNet pre-training. While self-supervised pre-training worked both with DivideMix and ELR+ almost out of the box, we were not able to get good results with ImageNet. 14/n
To conclude, C2D is a simple and efficient way to boost learning with noisy labels. Our code is available, but you don't really need it: just train SimCLR on your dataset and use it as initialization to your current method to fight noise. It probably will work better. 15/n, n=15

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Evgenii Zheltonozhskii

Evgenii Zheltonozhskii Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!