[🧠 Paper Summary 📚] An interesting paper was recently published to arxiv: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (although it originally appeared in May 2021).
The main idea is this: 1/ 🧵
If you have an overparametrized neural network (more params than the # of data points in your dataset) and you train it way past the point where it has memorized the training data (as suggested by the low training loss, and high val loss), all of a sudden the network will...
2/
learn to generalize, as suggested by a rapid decrease in the val loss (colloquially known as "grok" i.e. the NN understood/figured it out).
Practitioners usually stop training networks at the 1st sign of overfitting (as evidenced by an increasing gap between train/val loss).
3/
Interestingly enough, the authors of the paper forgot to turn off the model training and that led to this paper haha! @OpenAI@exteriorpower
The phenomenon is particularly evident on smaller datasets.
This goes against the common understanding in statistics that suggests...
4/
that you want to have underparametrized models so that you force the model to learn the rule (and thus generalize to unseen data) instead of memorizing the training dataset.
The trick is that heavily overparametrized models combined with especially weight decay seem to be...
5/
able to "carve out" that simple model from the big overparametrized NN.
Also, again, there seems to be some correlation between flatter loss regions and good generalization capabilities. It may be that weight decay is especially good at leading the optimization process...
6/
towards those flatter regions of the loss landscape.
[🧠 collective intelligence 🧠] I've been intrigued by the cellular automata (CA) concept for a long time, and by the potential, mutually beneficial interaction between CAs and deep learning, so I decided to dig a bit deeper. Here are some interesting resources I found:
1/🧵
distill.pub/2020/growing-c… <- one and only Distill. A nice introduction to how neural CA works and how it may be a potentially useful model of morphogenesis and regeneration processes in developmental biology.
When successful people say they're not the best at anything, what they really mean is that they are not the best alongside any of the dimensions that we humans have a name for and that we can easily quantify (running 100m/chess/comp. programming/traveled the most/h-index...)
1/
But if we were to quantify "bestness" as the length of the resultant vector across all of the relevant dimensions - then the scoreboard changes.
They are usually, all in all, the most well-rounded professionals.
2/
That's why we don't have a global scoreboard for "the best entrepreneur", "the best leader", etc.
These roles have an immense breadth and it's impossible to quantify and rank those people.
3/
In this video, I build an MLP (multi-layer perception) and train it as a classifier on MNIST (although it's trivial to use a more complex dataset) - all this in pure JAX (no Flax/Haiku/Optax).
2/
I then add cool visualizations such as:
* Visualizing MLP's learned weights
* Visualizing embeddings of a batch of images in t-SNE
* Finally, we analyze the dead neurons
3/
[🔥 Learn ML for beginners 🥳] I recently said I'll be binge-watching fast.ai's Practical Deep Learning for Coders and I did, here are my final thoughts!
I'm mainly going to contrast it with @coursera's course as that's the course I took back in late 2018.
1/
Verdict:
If you're in high school or a student or more precisely somebody who still has difficulties creating your own learning program (no experience with self-education) I'd recommend you take @coursera's course - it's more streamlined.
2/
You'll know exactly when to read, watch, or code.
On the other hand, if you already have some experience (you had some tech internships/jobs) or you're considering switching careers (again you're experienced) or simply integrating deep learning into your own domain...
3/
Again thanks to @PetarV_93, @relja_work, Cameron Anderson, Saima Hussain for being supportive throughout this journey!
2/
In this blog you'll find:
* The details on how @DeepMind's hiring pipeline is structured.
* Many tips on how to prepare for top-tier AI labs (like DeepMind, OpenAI, etc.) in the world (for research engineering roles but I guess many tips will apply for scientists as well).
3/