Tweet

Evgenii Zheltonozhskii

26 Mar, 16 tweets, 5 min read

@evgeniyzhe

Our new paper, C2D (arxiv.org/abs/2103.13646, github.com/ContrastToDivi…) shows how self-supervised pre-training boosts learning with noisy labels, achieves SOTA performance and provides in-depth analysis. Authors @evgeniyzhe @ChaimBaskin Avi Mendelson, Alex Bronstein, @orlitany 1/n

The problem of learning with noisy labels (LNL) has great practical importance: large clean dataset is often expensive or impossible to get. Existing approaches to LNL either modify the loss to account for the noise or try to detect the noisy-labelled samples. 2/n

Yet, we need a starting point. For that, we use "warm-up": regular training on full dataset, which relies on the intrinsic robustness of DNNs to the noise. Main goals of warm-up are providing feature extractor and keeping loss of the noisy samples high, avoiding memorization. 3/n

We believe that inability to achieve those goals is a significant obstacle to improving performance of the learning with noisy labels. While robustness of the networks allows us to achieve good results, we don't know about what are its sources or how to improve it. 4/n

One (very popular) solution is using large scale pre-training, for example, on ImageNet. However, outside of natural images domain, large clean datasets may not exist. Moreover, in some of our experiments ImageNet pre-training have actually harmed the performance. 5/n

Instead, we propose to use self-supervised pre-training. This way, we do not require external data; by ignoring labels, we eliminate influence of noise on the pre-training; by operating on the training set, we avoid domain gap. This can be combined with any LNL method. 6/n

Our proposed framework, which we call Contrast to Divide, or C2D, comprises two stages: self-supervised pre-training (contrast phase), followed by standard algorithm for learning with noisy labels, which can now enjoy better initialization 7/n

Tested with two SOTAs, DivideMix and ELR+, C2D shows huge boosts both in real-life and synthetic cases. For mini-webVision, we improve by more than 3% both on WebVision (79.42%) and ImageNet (78.57%) validation sets. For CIFAR-100 with 90% noise we improve from 34% to 64% 8/n

For Clothing1M dataset the default approach is to utilize ImageNet-pre-trained ResNet-50 as initialization. C2D was able to match state-of-the-art performance (74.58+-0.15% as compared to 74.81%) without any external data. 9/n

For analysis, we started with looking on UMAP features for CIFAR-10. We took the features at the end of warm-up for 20% and 90% noise with and without C2D, and features at the end of self-supervised pre-training. C2D enjoys nuch better separability at high noise settings 10/n

To quantify warm-up goals, we measure loss separability (as ROC-AUC of GMM trained on the loss values) and feature extraction (evaluated with linear classifier) on CIFAR-100. For both, C2D provides significant accuracy boost to ImageNet pre-training and traditional warm-up. 11/n

We also looked at the distribution of the loss values for clean and noisy samples at the end of warm-up. Interestingly, ImageNet pre-training appears to have even larger overlap than no pre-training, both significantly worse than C2D. 12/n

Finally, we compared C2D with DivideMix to MixMatch. You can think of it as providing an oracle knowing which samples are noisy to DivideMix. Impressively, MixMatch is not better than C2D, and with self-supervised pre-training for MixMatch the difference is only 2%. 13/n

Another interesting phenomenon is that we were unable to achieve good results on CIFAR with ImageNet pre-training. While self-supervised pre-training worked both with DivideMix and ELR+ almost out of the box, we were not able to get good results with ImageNet. 14/n

To conclude, C2D is a simple and efficient way to boost learning with noisy labels. Our code is available, but you don't really need it: just train SimCLR on your dataset and use it as initialization to your current method to fight noise. It probably will work better. 15/n, n=15

@threadreaderapp

@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Evgenii Zheltonozhskii

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?