Profile picture
gwern @gwern
, 15 tweets, 3 min read Read on Twitter
One interesting thing about deep learning is that even as ever better results surface, everything we know about NNs is probably wrong. A short list (in rough chronological order):

- "you need to pretrain a NN"
- "NNs require thousands of datapoints to train" (1/10)
- "NNs must be trained by backpropagation"
- "hybrid approaches like SVMs on top of NN features will always work better"
- "backpropagation in any form is biologically implausible"
- "CNNs are nothing like the human visual cortex & certainly don't predict its activations"
- "small NNs can't be trained directly, so NNs must need to be big"
- "NNs only learn task-specific features, certainly no kind of hidden or latent 'dark knowledge'"
- [style transfer arrives] "Who ordered that?"
- "simple SGD is the worst update rule"
- "simple SGD is the worst update rule"
- "simple self-supervision like next-frame prediction can't learn semantics"
- "adversarial examples're easy to fix & won't transfer, well, won't blackbox transfer, well, won't transfer to realworld, well..."
- [batchnorm arrives] "Oops."
- "big NNs overfit by memorizing data"
- "you can't train 1000-layer NNs but that's OK, that wouldn't be useful anyway"
- "big minibatches don't generalize"
- "NNs aren't Bayesian at all"
- "convolutions are only good for images; only LSTM RNNs can do translation/seq2seq/generation/meta-learning"
- "you need small learning rates, not superhigh ones, to get fast training" (superconvergence)
- "memory/discrete choices aren't differentiable"
- [CycleGAN arrives] "Who ordered that?"
- "you can't learn to generate raw audio, it's too low-level"
- "you need bilingual corpuses to learn translation"
- "you need shortcut connections, not new activations or initializations to train 1000-layer nets"
- "NNs can't do zero-shot or few-shot learning"
- "NNs can't do planning, symbolic reasoning, or deductive logic"
- "NNs can't do causal reasoning"
- "pure self-play is unstable and won't work"
- "only convolutions and LSTM RNNs can do (translation/...), not feedfoward NNs with attention"
- "learning deep environment models is unstable and won't work"
- "OK maybe we do need pretraining, er, 'warmup', after all"
- "we need hierarchical RL to learn long-term strategies" (not bruteforce PPO)
- "you can't reuse minibatches for faster training, it never works" (
- ...
- 'NNS are inherently lossy as they work by abstracting and compressing; certainly they can't be invertible'
- 'resnets are shallow ensembles, *obviously*'
(As @karpathy says, 'neural networks want to work' - and they are very patient with us as we figure out every possible way to train them orders of magnitude worse, slower, and bigger than necessary...)
- 'batchnorm is for whitening/reducing internal covariate shift, *obviously*' (
(Invertibility is not just a parlor trick invented to frustrate information theoreticians - it gives you big memory savings in activations/backprop, interesting differential equation/dynamic system connections, and is even useful in generative models: )
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to gwern
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!