Most recents (1)

[🧠 Paper Summary 📚] An interesting paper was recently published to arxiv: "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" (although it originally appeared in May 2021).

The main idea is this:
1/ 🧵

If you have an overparametrized neural network (more params than the # of data points in your dataset) and you train it way past the point where it has memorized the training data (as suggested by the low training loss, and high val loss), all of a sudden the network will...

2/

learn to generalize, as suggested by a rapid decrease in the val loss (colloquially known as "grok" i.e. the NN understood/figured it out).

Practitioners usually stop training networks at the 1st sign of overfitting (as evidenced by an increasing gap between train/val loss).

3/

Read 7 tweets

Discover and read the best of Twitter Threads about #grokking

Most recents (1)

Related hashtags

Discover and read the best of Twitter Threads about #grokking

Most recents (1)

Related hashtags

Did Thread Reader help you today?