My Authors
Read all threads
Spent some time investigating history of "double descent". As a function of model complexity, I haven't seen it described before 2017. As a function of sample size, it can be traced to 1995; earlier research seems less relevant. Also: I think we need a better term. Thread. (1/n)
The term "double descent" was coined by Belkin et al 2019 pnas.org/content/116/32… but the same phenomenon was also described in two earlier preprints: Spigler et al 2019 iopscience.iop.org/article/10.108… and Advani & Saxe 2017 arxiv.org/abs/1710.03667 (still unpublished?) (2/n)
I don't like the term "double descent" because it has nothing to do with gradient descent. And nothing is really descending. It's all about bias-variance tradeoffs, so maybe instead of the U-shaped tradeoff one should talk about \/\-shaped? И-shaped? UL-shaped? ʯ-shaped? (3/n)
@PreetumNakkiran et al. drew attention to the fact that the same \/\-shape happens also as a function of sample size: see ICLR 2020 openreview.net/forum?id=B1g5s… and his follow-up preprints. The reviews on Openreview are interesting because they point to some much earlier work. (4/n)
Specifically, Opper 1995 (in The Handbook of Brain Theory and Neural Networks) reported \/\-shaped risk (as a function of sample size) for a linear model ki.tu-berlin.de/fileadmin/fg13…. See also Opper & Kinzel 1996 or Fig 10 in Opper 2001 review ki.tu-berlin.de/fileadmin/fg13… (5/n)
Furthermore, from this tweet I learned about the work of Duin and found Duin 1995 independently from Opper 1995 reporting the same thing: rduin.nl/papers/scia_95…. See also Raudys & Duin 1998, Loog & Duin 2012, etc. (6/n)
Duin calls this "peaking phenomenon" and says it goes back to 1960s, but I don't quite get it. E.g. here Duin cites Hughes 1968 ieeexplore.ieee.org/abstract/docum… but I think there it's just standard U-shaped underfitting/overfitting tradeoff, isn't it? (7/n)
Duin also refers (e.g. here 37steps.com/2448/trunks-ex…) to Trunk 1979 ieeexplore.ieee.org/document/47669… as a "very clear" example of "peaking phenomenon", but there I also only see U-shaped overifitting. If so, I think the "peaking phenomenon" terminology is only confusing. (8/n)
I'd be very grateful for any additions/corrections to this historical overview. See e.g. this last work by @PreetumNakkiran for many more recent references. END. (9/9)
PS. I should have pinged more authors of the mentioned papers: @advani_madhu @SaxeLab @mario1geiger @ilyasut @ShamKakade6 @tengyuma (and others).
PPS. Oh wow. See this answer and replies below. Thanks everybody for the ongoing discussion.
PPPS. @andrewgwils linked below () to his new preprint (arxiv.org/abs/2003.02139) citing even earlier work by Opper for "non-monotonic generalization capability"! Here is Opper et al. 1990 iopscience.iop.org/article/10.108…:
PPPPS. @Tweetteresearch linked to his 1993 paper that led me to a bunch of 1989 papers from Opper, Kinzel, Krogh, and others, discussing divergence of the unregularized risk at P/N=1. But Opper 1990 still remains the reference with the earliest \/\ plot.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Dmitry Kobak

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!