New @ #ICML2021: When a trained model fits clean (training) data well but randomly labeled (training) data (added in) poorly, its generalization (to the population) is guaranteed!
This result makes deep connections between label noise, early learning, and generalization. Key takeaways: 1) the early learning phenomenon can be leveraged to produce post-hoc generalization certificates; 2) can be leveraged by adding unlabeled training data (randomly labeled)
The work translates the early learning into a generalization guarantee *without ever explicitly invoking the complexity of the hypothesis class* & we hope others will dig into this result and go deeper.
This work represents roughly one year of constant work. We had initial results on ERM last summer but kept pushing to articulate the idea as fully as possible. We're happy about publication, but more excited to finally share this work with our community.
Also, if you find any typos and send to Siva, @saurabh_garg67 owes him one coffee per typo, so help a statistician out...
• • •
Missing some Tweet in this thread? You can try to
force a refresh