"torch.manual_seed(3407) is all you need"!
Sorry for the title. I promise it's not (entirely) just for trolling. It's my little spare time project of this summer to investigate unaccounted randomness in #ComputerVision and #DeepLearning.
🧵👇 1/n
The idea is simple: after years of reviewing deep learning stuff, I am frustrated of never seeing a paragraph that shows how robust the results are w.r.t the randomness (initial weights, batch composition, etc). 2/n
After seeing several videos by @skdh about how experimental physics claims tend to disappear through repetition, I got the idea of gauging the influence of randomness by scanning a large amount of seeds. 3/n
I have two setups (limited by my available computing budget): On CIFAR 10, I scanned 10^4 seeds using a fast resnet architecture, randomly initialized. On ImageNet, I used pretrained models and find-tuned them, scanning 50 seeds. 4/n
Here is the evolution of the accuracy on CIFAR 10. Dark red is 1σ, and light red is min/max values. So, well after convergence, there are still significant variations. 5/n
The distribution of accuracy of as follows. It's pointy, but has outliers nonetheless. The diff between min/max is just below 2%, which is huge for computer vision! 6/n
For ImageNet, an SSL pretrained ViT has the following distribution. More concentrated, but the min/max diff is just below 0.5%, which I find surprisingly large given all runs start from (almost) the same pretrained initial weights. 7/n
Of course, there are many limitations in this work: accuracy is not state of the art, seed scanning is small for ImageNet, not training from scratch, etc. But it's all I could do in a short amount of time with limited resources. It's all discussed in the draft. 8/n
There are many more results and discussion in the draft report: davidpicard.github.io/pdf/lucky_seed…
(the title is the winning seed among 10k on CIFAR 10)
My conclusions are that: 1) we definitely need a refined large scale version of this with 100 more computation used ; 2) many recent results are probably overestimating accuracy because of implicit seed selection (I discuss 2 ways it can happen in the report). 10/n
Anyway, this was my fun summer break project. If there's interest, I'll put the report on arxiv (to track updates). Code is already on github, but it's private because it's a mess.
Happy to have your feedback on this, especially if I did something completely wrong! 🥸
11/n, n=11.

Follow Us on Twitter!