Discover and read the best of Twitter Threads about #grokking

Most recents (1)

[🧠 Paper Summary 📚] An interesting paper was recently published to arxiv: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (although it originally appeared in May 2021).

The main idea is this:
1/ 🧵
If you have an overparametrized neural network (more params than the # of data points in your dataset) and you train it way past the point where it has memorized the training data (as suggested by the low training loss, and high val loss), all of a sudden the network will...

2/
learn to generalize, as suggested by a rapid decrease in the val loss (colloquially known as "grok" i.e. the NN understood/figured it out).

Practitioners usually stop training networks at the 1st sign of overfitting (as evidenced by an increasing gap between train/val loss).

3/
Read 7 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!