"torch.manual_seed(3407) is all you need"!
draft 📜: davidpicard.github.io/pdf/lucky_seed…
Sorry for the title. I promise it's not (entirely) just for trolling. It's my little spare time project of this summer to investigate unaccounted randomness in #ComputerVision and #DeepLearning.
🧵👇 1/n
The idea is simple: after years of reviewing deep learning stuff, I am frustrated of never seeing a paragraph that shows how robust the results are w.r.t the randomness (initial weights, batch composition, etc). 2/n
After seeing several videos by @skdh about how experimental physics claims tend to disappear through repetition, I got the idea of gauging the influence of randomness by scanning a large amount of seeds. 3/n
I have two setups (limited by my available computing budget): On CIFAR 10, I scanned 10^4 seeds using a fast resnet architecture, randomly initialized. On ImageNet, I used pretrained models and find-tuned them, scanning 50 seeds. 4/n
Here is the evolution of the accuracy on CIFAR 10. Dark red is 1σ, and light red is min/max values. So, well after convergence, there are still significant variations. 5/n
The distribution of accuracy of as follows. It's pointy, but has outliers nonetheless. The diff between min/max is just below 2%, which is huge for computer vision! 6/n
For ImageNet, an SSL pretrained ViT has the following distribution. More concentrated, but the min/max diff is just below 0.5%, which I find surprisingly large given all runs start from (almost) the same pretrained initial weights. 7/n
Of course, there are many limitations in this work: accuracy is not state of the art, seed scanning is small for ImageNet, not training from scratch, etc. But it's all I could do in a short amount of time with limited resources. It's all discussed in the draft. 8/n
There are many more results and discussion in the draft report: davidpicard.github.io/pdf/lucky_seed…
(the title is the winning seed among 10k on CIFAR 10)
My conclusions are that: 1) we definitely need a refined large scale version of this with 100 more computation used ; 2) many recent results are probably overestimating accuracy because of implicit seed selection (I discuss 2 ways it can happen in the report). 10/n
Anyway, this was my fun summer break project. If there's interest, I'll put the report on arxiv (to track updates). Code is already on github, but it's private because it's a mess.
Happy to have your feedback on this, especially if I did something completely wrong! 🥸
11/n, n=11.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with David Picard

David Picard Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!