Research Scientist at @MetaAI (FAIR) working to build a science of deep learning. Formerly at @DeepMind. Opinions are my own.
Jul 5, 2022 • 11 tweets • 5 min read
Neural scaling laws are great for predictability, but power law scaling is slow, especially in the large data regime when 10x the data results in small gains. Can we do better? We show that exponential scaling is possible via intelligent data pruning.
arxiv.org/abs/2206.14486
In work led by Ben Sorscher and Robert Geirhos and done in collaboration with @sshkhr16 and @SuryaGanguli, we show both theoretically and empirically that exponential scaling is possible even on ImageNet so long as you can accurately rank data point importance.
Mar 2, 2020 • 7 tweets • 6 min read
Recent studies have suggested that the earliest iterations of DNN training are especially critical. In our #ICLR2020 paper with @jefrankle and @davidjschwab, we use the lottery ticket framework to rigorously examine this crucial phase of training.
arxiv.org/abs/2002.10365@jefrankle@davidjschwab Existing methods can't find winning lottery tickets at init on larger networks. Instead, they only seem to emerge early in training. We exploit this in our experiments by as a causal way to measure the impact of various network properties on this early phase of training.
Mar 7, 2018 • 5 tweets • 3 min read
Just read through the @distillpub interpretability blog post from @ch402 and others. Stunning (and fun!) visualizations, but I wonder: what did these visualizations actually teach us about these networks? What do we know now that we didn't know before? distill.pub/2018/building-…
Saliency, activation maximization, etc. give us the impression of understanding, but it's often extremely difficult to express the conclusion of this "understanding." Absent a falsifiable hypothesis and rigorous quantification, can we actually say we've learned anything?