So some very smart folks have asked about how we would apply the AMIP metric to studies of rare events. This kicked off a discussion of what robustness checks are really for, and I want to take that set of questions seriously in this thread.
I think robustness checks mainly (ought to) function to illuminate how variation in the data is being used for inference, and we should then be able to discuss whether we think this is a reasonable situation and adjust our confidence in the results.
The problem is not that there is SOME change to which our analyses are sensitive -- of course there is, they has to be. If your results aren't affected by ANY change you make to the analysis, something has gone horribly wrong with the procedure.
I think @dmckenzie001 has said this before me (sorry if I misquote, David!) but it is not good to show a bunch of checks in which nothing the authors do changes the result at all. Suggests a lack of interest in truly understanding the analysis procedure.
The problem - the thing we rightly fear from robustness checks - is if analyses are sensitive to changes they ought NOT to be sensitive to according to our understanding of the world, or when results display sensitivities which would meaningfully affect our interpretation of them
These more problematic changes we should investigate generally come in two flavours: (1) any and all very *small* changes, and (2) a set of particular changes to which any sensitivity signals a model/assumption failure.
Very small changes of any kind are generally important to check because if analyses are very sensitive to those, it signals a fragility that we would usually like to know about, and which would often purely in itself affect our interpretation of the results.
All practitioners (myself included, Bayesians very much included) know that we don't have the exact right model, an exact random sample from the exact population of interest, etc. We need to know if small deviations from the ideal experiment would cause big changes in our results
Analysis of rare events is a case where it can be entirely reasonable for results we truly do care about to be based on 1% of the sample. But there are (at least) 2 reasons why I think you should still compute the AMIP and find the AMIS to overturn your results.
First, your *audience* needs to know that some tiny % of the sample drives the result. If you're confident this is perfectly ok, there should be no problem sharing that information very plainly.
It is easy to fail to fully appreciate this, especially in areas like econ of crime. If you sample 30K people, and get 6 murders in the control group and 3 in treatment, your audience needs to be absolutely and fully aware that the comparison your result rests on is 6 vs 3.
Now you might want to then adjust what you'd think is a problematic percentage. Of course dropping all murders from 1 group will affect things. But should dropping ONE murder overturn the result? Is that ok?
One aspect of this issue is being able to define -- ideally beforehand!! -- what a "small" change is in a given analysis. We don't advocate for any particular filter or binary rule for robust / non robust in the paper for that reason.
Second, if you're structurally, by design, only using tiny % of the data, it usually becomes questionable whether you really are in a "big N" world. Fat tails affect the functioning of the delta method. The efficiency of the mean (vs the median) is eroded extremely fast.
Even if the rare event is a binary outcome, that means p in the model is very close to 0 or 1, and inference near the boundaries of parameter spaces is notoriously difficult to do. You should virtually never run an LPM if the true probability of an event lies near the boundary.
If p = 0 or 1 then your typical off the shelf estimator's behaviour is going to be nonstandard. Mostly people know that, or they figure it out fast, cause the computer will usually freak out.
But what we see from e.g. the weak IV literature, and the "local to unity" time series literature, is that once you are in the *neighbourhood* of pathology, you are usually already affected by the pathology.
So if p = 0.98 or 0.01 in your bernoulli trial, you're probably in trouble from a classical inference perspective. Usually at the very least your root N asymptotics are no good. (I teach this in the context of weak IV and it's the most fun we have all semester... for me anyway)
This is not at all to suggest that the AMIP signals a problem with the classical inference. It need not. There are cases where the classical properties are fine and the AMIP is big (we simulated some).
But rare events - even rare binary events - are cases in which the classical inference will generally have problems even if the AMIP is small.
It is not inherently wrong for results to be based on 1% of the sample. It might be fine. But if so you need to (A) know it, and be absolutely crystal clear with your readers about it (B) be careful to use statistical techniques that are equipped to handle that situation.
We do not advocate for any fixed % rule, though as we made special note of dropping of less than 1% in the abstract, that does tell you that we thought about that as being a small part of the data for our applications. but there's no one-%-fits-all.
In our R package, our default implementation does NOT report "here's what happens if uyou remove the most influential 1%". Instead, our default reports the minimal removal proportion to change the sign, significance, or both, of the analysis.
I personally don't want to see this become a threshhold crossing rule, which I do not like in general (ask me about test size being 0.05 all the time! hate it!!!). But I do want to generally see the AMIP so I can adjust my understanding if I need to. FIN!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rachael Meager (see pinned tweet for meme source!)

Rachael Meager (see pinned tweet for meme source!) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @economeager

10 Dec
Applied econometrics squad assemble!!!!
Guys this paper is super important. Arnold, Hull and Dobbie are among the most careful applied econometricians we have, and the explosion of algorithmic decision making means this method -- and their finding of pervasive discrimination -- could hardly be more timely.
hey since we were discussing the other day how even just "select high contrast areas to thumbnail" is a racist decision rule given the history of photography AND since you guys only understand one language, i made this for u
Read 4 tweets
9 Dec
"average person eats 3 spiders a year" factoid actualy just statistical error. average person eats 0 spiders per year. Spiders Georg, who lives in cave & eats over 10,000 each day, is the AMIS. we've decided to show quantiles of the spider consumption distribution instead.
This was my friend @averyflinders idea, cowriting credit where it is due!!! #amipmemes
tumblr deep cut, real OG online folks will recognize
Read 5 tweets
8 Dec
this meme absolutely nails the spirit of our paper and it’s my fav Arthur and Eames moment!!! I am truly blessed today
Some of you thought i would put the “/“ in there, not on main folks
If you don’t get this comment I’m begging you to scroll on by it’s an inside joke
Read 4 tweets
7 Dec
This one is really fuckin good you guys
This is honestly like Christmas to me
Read 4 tweets
2 Dec
Hello! Tamara Broderick, Ryan Giordano and I have a new working paper out!! It's called "An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?" arxiv.org/abs/2011.14999 Image
Here comes the paper thread!!! Aaaaaah!!!
We propose a way to measure the dependence of research findings on the particular realisation of the sample. We find that several results from big papers in empirical micro can be overturned by dropping less than 1% of the data -- or even 1-10 points, even when samples are large.
Read 30 tweets
1 Dec
Also the London Review of Books is better than the New Yorker send tweet.
1. we have Lauren Oyler, who do u have? Jia?? Sad.
2. we have a Nabokov bingo square so incendiary I'd get immediately permabanned if I tweeted a screenshot.
3. we have new hits from Anne Carson
I do love the New Yorker and 2 NYer articles - one about the Bostswanan diamond mines and one about the deep sea submarine - were 2 of the best things I read all year. but LRB is better and in your heart you know it.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!