[🧠 Paper Summary 📚] An interesting paper was recently published to arxiv: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (although it originally appeared in May 2021).

The main idea is this:
1/ 🧵
If you have an overparametrized neural network (more params than the # of data points in your dataset) and you train it way past the point where it has memorized the training data (as suggested by the low training loss, and high val loss), all of a sudden the network will...

2/
learn to generalize, as suggested by a rapid decrease in the val loss (colloquially known as "grok" i.e. the NN understood/figured it out).

Practitioners usually stop training networks at the 1st sign of overfitting (as evidenced by an increasing gap between train/val loss).

3/
Interestingly enough, the authors of the paper forgot to turn off the model training and that led to this paper haha! @OpenAI @exteriorpower

The phenomenon is particularly evident on smaller datasets.

This goes against the common understanding in statistics that suggests...

4/
that you want to have underparametrized models so that you force the model to learn the rule (and thus generalize to unseen data) instead of memorizing the training dataset.

The trick is that heavily overparametrized models combined with especially weight decay seem to be...

5/
able to "carve out" that simple model from the big overparametrized NN.

Also, again, there seems to be some correlation between flatter loss regions and good generalization capabilities. It may be that weight decay is especially good at leading the optimization process...

6/
towards those flatter regions of the loss landscape.

Check out the paper here:
arxiv.org/abs/2201.02177

Also, @ykilcher nicely explained it here:


#deeplearning #overparametrization #grokking

7/7

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aleksa Gordić

Aleksa Gordić Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @gordic_aleksa

30 Dec 21
[🧠 collective intelligence 🧠] I've been intrigued by the cellular automata (CA) concept for a long time, and by the potential, mutually beneficial interaction between CAs and deep learning, so I decided to dig a bit deeper. Here are some interesting resources I found:

1/🧵
distill.pub/2020/growing-c… <- one and only Distill. A nice introduction to how neural CA works and how it may be a potentially useful model of morphogenesis and regeneration processes in developmental biology.

@zzznah

2/
(think how we humans are formed from a single egg cell - when and how does this multiplication of cells stop and stays stable?)

3/
Read 8 tweets
26 Dec 21
When successful people say they're not the best at anything, what they really mean is that they are not the best alongside any of the dimensions that we humans have a name for and that we can easily quantify (running 100m/chess/comp. programming/traveled the most/h-index...)

1/
But if we were to quantify "bestness" as the length of the resultant vector across all of the relevant dimensions - then the scoreboard changes.

They are usually, all in all, the most well-rounded professionals.

2/
That's why we don't have a global scoreboard for "the best entrepreneur", "the best leader", etc.

These roles have an immense breadth and it's impossible to quantify and rank those people.

3/
Read 7 tweets
7 Nov 21
Watch me code a Neural Network from Scratch in pure JAX! 🥳 in this 3rd video of the Machine Learning with JAX series.

YouTube:
GitHub: github.com/gordicaleksa/g…

@DeepMind @GoogleAI @jakevdp @froystig @SingularMattrix @cdleary

#jax
In this video, I build an MLP (multi-layer perception) and train it as a classifier on MNIST (although it's trivial to use a more complex dataset) - all this in pure JAX (no Flax/Haiku/Optax).

2/
I then add cool visualizations such as:
* Visualizing MLP's learned weights
* Visualizing embeddings of a batch of images in t-SNE
* Finally, we analyze the dead neurons

3/
Read 5 tweets
9 Oct 21
Wow. @MIT's course "Introduction to Computational Thinking" is a truly amazing initiative!

YT:

So glad they brought in @3blue1brown! + @JuliaLanguage!🤯By doing this MIT has shown once more that they are the true innovators.

1/
* Powerful interactive notebooks (like @PlutoJL and @ProjectJupyter) that promote active learning.

* High-quality animations/visualizations (created using open-source libs such as github.com/3b1b/manim).

* Contextual approach with lots of example.

2/
* + the best educators in the world, leveraging the scale that platforms such as @YouTube give us, is a glimpse into the future of education!

I know that @YouTube + @MIT shaped who I am - that much is certain. My self-education path heavily relied on them. Forever grateful.

3/
Read 7 tweets
6 Oct 21
[🔥 Learn ML for beginners 🥳] I recently said I'll be binge-watching fast.ai's Practical Deep Learning for Coders and I did, here are my final thoughts!

I'm mainly going to contrast it with @coursera's course as that's the course I took back in late 2018.

1/
Verdict:

If you're in high school or a student or more precisely somebody who still has difficulties creating your own learning program (no experience with self-education) I'd recommend you take @coursera's course - it's more streamlined.

2/
You'll know exactly when to read, watch, or code.

On the other hand, if you already have some experience (you had some tech internships/jobs) or you're considering switching careers (again you're experienced) or simply integrating deep learning into your own domain...

3/
Read 11 tweets
5 Oct 21
[landing a job at top-tier AI labs] It's officially published! 🥳The whole story of how I landed a job at @DeepMind is out!

Blog: gordicaleksa.medium.com/how-i-got-a-jo…

It took some time to write this one.😅
Again thanks to @PetarV_93, @relja_work, Cameron Anderson, Saima Hussain for being supportive throughout this journey!

2/
In this blog you'll find:
* The details on how @DeepMind's hiring pipeline is structured.

* Many tips on how to prepare for top-tier AI labs (like DeepMind, OpenAI, etc.) in the world (for research engineering roles but I guess many tips will apply for scientists as well).

3/
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(