Tweet

Tivadar Danka

28 Oct, 12 tweets, 7 min read

@stateofaireport

Neural networks are getting HUGE. In their @stateofaireport 2020, @NathanBenaich and @soundboy visualized how the number of parameters grew for breakthrough architectures. The result below is staggering.

What can you do to compress neural networks?

👇A thread.

1⃣ Neural network pruning: iteratively removing connections after training. Turns out that in some cases, 90%+ of the weights can be removed without noticeable performance loss.

@ylecun

A few selected milestone papers:
📰Optimal Brain Damage by @ylecun, John S. Denker, and @SaraASolla. As far as I know, this was the one where the idea was introduced.

papers.nips.cc/paper/250-opti…

@jefrankle

📰The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks by @jefrankle and @mcarbin.

arxiv.org/abs/1803.03635

@Hidenori8Tanaka

📰 Pruning neural networks without any data by iteratively conserving synaptic flow by @Hidenori8Tanaka, Daniel Kunin, @dyamins, and @SuryaGanguli

arxiv.org/abs/2006.05467

How to do this in practice?
🏗️ TensorFlow Model Optimization Toolkit: tensorflow.org/model_optimiza…
🏗️ PyTorch pruning tools: pytorch.org/tutorials/inte…

2⃣ Knowledge distillation: teaching a smaller network to learn the predictions of the big one. Since predictions are available for unlabelled data as well, the student network learns how to generalize like the teacher.

@geoffreyhinton

📰 This technique was introduced by @geoffreyhinton, @OriolVinyalsML, and @JeffDean in their paper Distilling the Knowledge in a Neural Network.

arxiv.org/abs/1503.02531

@huggingface

📰 One recent success story with knowledge distillation is DistilBERT from @huggingface. 40% smaller and 60% faster, while retaining 97% of its language understanding capabilities!

arxiv.org/abs/1910.01108

🏗️ Since it doesn't require any special tool, it can be performed with any framework.

3⃣ Quantification: using integer types instead of float32 for faster computation. There are two flavors: quantizing after training and quantization-aware. The former is simpler but can result in a performance decrease.

Image source: TensorFlow Lite docs

How to do this in practice?
🏗️ TensorFlow Lite: tensorflow.org/lite/performan…
🏗️ PyTorch: pytorch.org/docs/stable/qu…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Did Thread Reader help you today?

Like this author's thread?