Hello! Tamara Broderick, Ryan Giordano and I have a new working paper out!! It's called "An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?" arxiv.org/abs/2011.14999
Here comes the paper thread!!! Aaaaaah!!!
We propose a way to measure the dependence of research findings on the particular realisation of the sample. We find that several results from big papers in empirical micro can be overturned by dropping less than 1% of the data -- or even 1-10 points, even when samples are large.
We do find some results are robust, and can simulate totally robust results, so this thing genuinely varies in practice. You might think that the sensitive cases must have outliers or spec problems, but they don't need to, and our headline application has binary outcome data!
So what is this metric? Ok, for any given result (sign of an effect, significance, whatever) we ask if there is a small set of observations in the sample with a large influence on that result, in the sense that removing them would overturn it.
This is like asking how bad it could be for your result if a small percentage of the data set was lost, or a small percentage of your population of interest was not in the sample.
Such exclusions are pretty common, both because of practical difficulties of perfectly randomly sampling the real world (and humans processing the data), and b/c the world is always changing across space and time, even if only a little.
Typically it's not reasonable to assume these deviations from the ideal thought experiment are random: there's usually a reason you can't perfectly sample people or places, or interpret everyone's data intelligibly, or predict the future!
So we want to know if there's a small number of highly influential points in the sample, capable of overturning our result if dropped. Finding them exactly is possible but usually computationally prohibitive -- you have to cycle through too many combinations most of the time.
We develop an approximation to the influence of removing any given set of points. It's a Taylor expansion type of thing, but what's exciting is YOU CAN *ALWAYS* CHECK IF THIS APPROXIMATION WORKED IN *EVERY* GIVEN SAMPLE! So it's not the usual bullshit "trust my big math" thing.
Our approach identifies these approximate-high-influence points, so you can always remove them, re-run the analysis once, and see if the result changes. Whatever you get is an exact lower bound on the true sensitivity, since at worst we missed out on some higher-influence points.
In our applications we almost always achieve the claimed reversal (tho we discuss exceptions in the paper, and it seems like having true parameter values lying near the boundaries of their spaces is a problem even if you transform the parameter).
Now of course if you'd like some big math, we do have some big math for you. We formally derive the approximation for Z estimators (like GMM, OLS, IV, MLE) under regularity conditions.
We have explicit bounds on the approximation error for OLS and IV - it's small relative to the real change in the result. We show our metric is a semi-norm on the Influence Function, linking it to standard errors and gross error sensitivity, which are different norms on the IF.
Why are some analyses so sensitive? It turns out to be linked to the signal to noise ratio, where the signal is the strength of the empirical result, and the noise is large when the influence function is "big" in a specific sense.
For OLS, the value of the influence function for each data point is just that point's sample regression error times its leverage. One or the other is not enough. You need both at once, on the same point. That's part of why you can't eyeball this thing in the outcome or errors.
Wouldn't that "noise" show up in standard errors? No, because standard errors are divided through by root-N. Big N can make SEs small even when the noise is large. That's also why SEs disappear asymptotically, but our metric won't. Important as we move into the "big data" space.
Also, this noise reflects a distributional shape component that SEs don't, but that is NOT just about outliers: we show that this sensitivity to 1% of the sample can arise even in perfectly specified OLS inference on Gaussian data, and it also arises in practice on binary data.
This links up to something we were discussing on twitter earlier this year: what's intuitively wrong with running an OLS linear reg when the X-Y data scatter is basically vertical? Well, many things, but one of them is that the signal to noise ratio is *probably* quite low.
The fact that this sensitivity can arise even when nothing is formally "wrong" with the classical inference can feel weird, because we are used to thinking of our SEs and performance metrics like bias, test size, etc as capturing all the uncertainty we have -- but they don't!
classical procedures are only designed to capture one type of uncertainty: the variation in an estimator's finite sample value across perfectly random resamplings of the exact same population.
But this hypothetical perfect resampling experiment doesn't really capture all the relevant uncertainty about results in social science. We're not physicists or crop yield analysts, so we shouldn't expect their statistical tools to be suitable for us.
We need to ask about data-dependence in ways that make more sense given how we actually generate and use empirical results!
*Bayesian whisper* also wouldn't you rather know about the dependence of a research result on the sample you HAVE rather than the dependence you could imagine having in some hypothetical thought experiment based on a resampling exercise you could never do? You would, come on!!
But let me be super clear: my own bayesian papers don't escape this problem and you should check out the paper if you want to see me dunk on myself for several pages. (My hunch is that I have been using overly weak priors.)
We think you should report our metric. We definitely don't think you should abandon results that are not robust, but it should prompt a desire to understand the estimation procedure more deeply, and promote caution in generalizing them too broadly.
We wrote you an R package to compute and report it automatically! It all uses Python at the moment, so you need to have that installed. Future versions will be able to do OLS and IV without Python though! github.com/rgiordan/zamin…
We really hope this paper can be part of a broader conversation about empirical social science that leads us all to try out new ways of interrogating our data and understanding our conclusions a lot more deeply.
Don;t think of this new metric as "yet another thing you have to report (sigh)" but as yet another tool to illuminate the way in which your procedure is constructing information about the population from your sample. And what could be more important than that!? Nothing!
FIN!!!!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Also the London Review of Books is better than the New Yorker send tweet.
1. we have Lauren Oyler, who do u have? Jia?? Sad. 2. we have a Nabokov bingo square so incendiary I'd get immediately permabanned if I tweeted a screenshot. 3. we have new hits from Anne Carson
I do love the New Yorker and 2 NYer articles - one about the Bostswanan diamond mines and one about the deep sea submarine - were 2 of the best things I read all year. but LRB is better and in your heart you know it.
It's friday, it's hot as heck, nobody can do any work, it's time to read @MWillJr's paper on the impact of repealing gun permit to purchase laws on gun prevalence, gun homicides and suicides: morganwilliamsjr.com/wp-content/upl…
Purely speaking as an applied econometrician, this paper has several things that I love and the first one is the use of generalized synthetic control. This is by now the most credible approach to understanding the impact that state-level legal changes like this are likely to have
Synthetic control is still generally underused by economists, who still seem to favour using a battery of fixed effects (perhaps not realising that there is a huge cost to stripping out variation). This paper is one of very few that I've seen that avoids that common trap.
This is a really great thread for what to do if your attention span and reading comprehension and mental processing is shot to hell -- like, actually shot to hell.
I have found this general methodology to be really useful in life. Basically, if you're in location X with something (say, how much work you're able to complete in a day), you're just making it worse for yourself if you say "I'm going to get it together and do 5X tomorrow!"
Just assume in general that at best, at VERY MOST, you can expect tomorrow to be today + a 5% improvement in some direction. If you can figure out what's the right direction, then you can start aiming for your 5% improvement. Then you build on that if you keep doing it.
Overfitting is probably the most important concept that is missing from mainstream econometrics classes. Caltech prof Yaser Abu-Mostafa has an incredible intro lecture to overfitting here from his ML course:
If you take metrics with me, of course, you learn about overfitting. But often, this lesson is painful: it contradicts the powerful instinct that one should prefer complex, ideally nonparametric models to simple models that place strong structure on the data.
Overfitting affects all inference problems in economics because it arises when the true data-generating process is too complex for you to feasibly capture given the data and the tools that you have, and this is always the case in social science.
Some folks outside Econ, especially in medicine, can't understand why the randomistas won a Nobel. I think this confusion arises because many people mistakenly believe the key ingredient to scientific progress is intelligence. But it isn't: the key ingredient is courage.
It's easy to see that doing randomized experiments could revolutionize our understanding of economic and social policy. But nobody was doing it, so nobody really knew. It was expensive, hugely time-consuming, and extremely risky.
Who had the guts to actually try it? Banerjee, Duflo and Kremer. And what's more they had the tenacity and foresight to pursue a new path and build the social and professional institutions that would allow others to join them if they chose.
Have you just accepted an offer of admission to an economics phd? Congratulations! Now read this excellent advice from Matthew Pearson on how to survive first year, it helped me and many of my friends: law.vanderbilt.edu/phd/How_to_Sur…
It also has the best cold open I've ever read: "Dear First-year Graduate Student,
Welcome to the threshold of hell (just kidding, more like the patio, or courtyard of hell)."
The one thing I disagree with here is I don't think you should want the phd - or anything - more than life itself. The most important thing is your physical and mental health, and you will have to be strong in holding to this priority in first year.