We still have a relatively poor understanding of the relationship between research and policy. Program evaluation in particular is often motivated by a desire to make policy better. But how effective is program evaluation itself? Michelle Rao's JMP tackles this question.
You can find this paper and more at her research homepage here assuming Elon does not sink this thread like a stone with his algorithm.
Let me now explain in my view what's so exciting about Michelle's paper.michellerao.com/research
Michelle looks at cash transfer (CCT) programs in Latin America, where civil services are technocratic, w high apparent demand for evidence as regards policy. But the correlation between the release of such evidence and changes in policy spending on the evaluated programs is zero
maybe policymakers are so involved with these programs that their experience allows them to form "correct" beliefs -- they don't update when they see evidence because they don't need to. Michelle builds a model of surprises that shows... this also is not happening (probably).
maybe policymakers don't update off *any* individual evaluation, but consider bodies of evidence. Michelle uses Bayesian hierarchical models to aggregate the country-specific literatures and examines whether there is any correlation between that and policy spending. There is not.
She considers several different ways to codify the results of program evaluations, and considers researcher degrees of freedom in how they frame the evidence and performs sentiment analysis on paper abstracts. The results are surprisingly consistent in that they are always zero.
A zero association between evaluation results and policy spending doesn't mean there's no causal relationship between them. But it does substantially confine what that relationship can look like. In this case, it is hard to think of a homeostatic mechanism at play...
But even if indeed the relationship between evidence and spending is zero, it's not clear why. Ex ante there was every reason to think these policymakers want and use this evidence. They probably *do* want and use it -- but it doesnt seem to correlate with spending. Why not?
Michelle brings it down to 3 issues: credibility of evidence, actionability of evidence, and generalizability of evidence. The big push in applied micro leads us to suspect credibility might be the thing that matters. But it isn't -- RCTs have zero correlation with spending.
The big push in meta-science and my own subfield on effect heterogeneity / external validity might lead us to suspect that it's generalizability that matters. But it looks like it isn't this either -- at least not the way we've currently been measuring it (hierarchical models).
Actionable evidence, measured by timely release of the results (esp when program is linked to ruling party) is the only thing strongly positively correlated with spending changes. This suggests the policy environment is changing over time -- quick release of results may be key
Michelle's paper completely changed how I think about evidence and policy. Credibility and generalizability matter for us -- but seems it may take more than that for research to influence policy spending at scale. If we want that to happen, it looks like we have more work to do.
I am particularly excited to present a student to the market with such an important paper and a strong and robust null headline result. This is the changing face of social science, and Michelle's work provides a blueprint I hope others will follow on a variety of fronts.
In conclusion interview and hire michelle!!!! She has a super strong pipeline, including a bunch of bayesian evidence aggregation work in behavioural/gender spaces, and an upcoming RCT with the world bank! michellerao.com/home
oh also she is on here! she just isn't addicted @mrao_econ
• • •
Missing some Tweet in this thread? You can try to
force a refresh
let's see what DOGE has been cancelling at the department of education. @stuartbuck1 went through the list. will link to his post at the end of the thread. i will start with his 4th example: foundational research to understand how american kids are doing at school
DOGs has also cancelled an impact evaluation of after-school programs in high-poverty schools with the goals of reducing drugs and violence. the program is intact, but the *evaluation* of the effect of the program, ie what it does or doesn't do to help kids, is cancelled!
fundamentally, whether you're smart enough or good enough is not your problem and, frankly, discerning that is not your comparative advantage and also not your job! your job is to decide what to attempt. that's all. i suggest attempting the things you'd be proudest to try + fail.
procrastinating on my real work by writing simulations of the bad controls problem in R
it's the last week of our semester my brain is fried and i just want to have fun with my best friend the rnorm() function
this problem is so weird, we all know it's bad but somehow that doesn't stop people from thinking they can study causal channels by controlling for the channels in a regression?? even somehow my own brain is just primed to accept the latter thing as acceptable when i see it??
One thing I found depressing about the discussion we had here last time is people started debating whether the author of this essay is sufficiently morally good as to be worthy of our attention. i'm worried for us if this is where we are going.
For example, a couple of people replied saying "woah, wait a minute, she masked her kid's fever with tylenol and tried to send him to school??? that's TERRIBLE" yeah guys, she gives that as an example of a bad, desperate thing she did when she didn't have a good handle on life
Something that has stuck with me from Kristen Neff's excellent book "self compassion" is the exercise where you list 3 socially-valued traits at which you are above-average, average and below average.
This is such a good exercise to demonstrate the difference between self-compassion and self-esteem or self-regard. if you love and accept yourself, it's not a big deal to acknowledge that you're average at some stuff, and also have weaknesses alongside your strengths.
treating yourself badly isn't a tax credit you get to write off the ethical implications of how well or poorly you treat others
sagest advice i can give grad students is to remember table reformatting always takes 1 whole day to do, regardless of how trivial the reformatting initially appears
i tweet this stuff because i know how discouraging it feels to experience yourself as "unproductive" over days like this, but some stuff - especially formatting, admin, etc - just actually takes much longer than you would think it does, and people don't like to talk about it
like, i'm not kidding, i had days where i'd think "well if this is all i can accomplish in a whole day, i guess i should quit now, i'll never be able to compete" but turns out i can compete just fine and actually people just don't talk about how much they get done each day