Josh Merfeld Profile picture
Mar 12 21 tweets 4 min read Read on X
New report just dropped. Time for a thread.

You can find the replication report and the author's response in the linked thread (in the second tweet).
Let me start with a small point: I hate the title of the replication. It refers to a comment in a do-file but, to be perfectly honest, it's entirely possible that it was just poor word choice from an RA whose first language is not English. The data is much more damning.
First up: there again appears to be overlap with other papers that have "different" treatments! As with earlier replications, this overlap is not discussed in any of the three papers at issue here.
Second: paper says that 76 primary schools were randomly assigned to treatment or control. But here's the thing: they were very clearly NOT randomly assigned. There are nine unions (admin areas) in the sample. Five are entirely treatment and four entirely control. Image
This is essentially impossible with actual randomization. Author claims "the contractor, using his practical knowledge, grouped treatment schools within contiguous unions to address local concerns". Since "balance" checks are okay, author claims internal validity is also okay.
But that is NOT how internal validity with RCTs works. In addition, not catching this "error" would be indicative of amazingly poor RCT management and data management. He also had the contractor randomly assign schools. What? No researcher I know does that.

Strike one.
In the author's response he names the contractor and blames him for everything. This is just honestly poor behavior. No shame.
There is clear inconsistency in the description of the data collection process and the actual data. It's hard to know exactly what's going on because the documentation isn't clear, so I'll move on from that point.
In the data itself, there is a mismatch in baseline test scores for the SAME STUDENTS in different datasets. Data was collected more than a decade ago, so some of this could be manual entry error, but there are a LOT of mismatches. Seems sus.
The really damning thing here is that there is systematic differences in this mismatch between the control and treatment groups. Treatment seems okay, but the control group's baseline values across datasets are different. Looks like control's baseline values were adjusted up! Image
Note that this matters because higher baseline in control means smaller increase at endline...
There's a huge amount of abnormal bunching in the test scores, and that bunching is MUCH worse in the control group. (I'd also note that the baseline and endline distributions are wildly different? Best-practice RCT management would be to use the same instrument in both.) Image
I mean what even is this? There is no way these data came from the same instrument. It's impossible. Image
The author did a household survey to ask certain questions, saying that households were randomly sampled from the treatment and control groups. But there is a HUGE disparity in the probability of having in survey across the groups for one of the two grades only.
Skipping forward a bit... different schools have different treatment statuses in different analyses. This alone could be sloppy data work or bad data entry, but it's just another "what?" in a long line of whats.
I find this particularly problematic: the author said that the household survey asked about parents' opinion re: the intervention (parent-teacher meetings). But: control parents were more positive about the intervention? Very very strange. Image
As far as I can tell, the author's response just says that these opinions are a secondary outcome, so no big deal. I don't think that's the right way to tackle this if the data are real.
Another thing that is very concerning: If the intervention was done as described, parents had to spend 15 minutes with parents of each child. Class size is around 59 on average, meaning that's an additional 15 hours per month per teacher.
But teachers were paid only two dollars more PER YEAR for the intervention. Author says "the intervention [was] remarkably low cost," but that seems misleading, at best.
Replicators do some additional robustness checks. Probably won't be surprised to learn that the results don't hold up.
TLDR: I don't believe a word in the original study.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Josh Merfeld

Josh Merfeld Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Josh_Merfeld

Feb 24
Reported treatment effects are GIGANTIC. 2.5 sd in India! Never seen anything that large before.
First damning thing: "Participants were drawn from two earlier studies and the treatment assignments... were simply reused." (p. 2)

Note the title of the paper though. Best-case scenario: they are layering another intervention but saying it's new.
Oh no: "we uncover unlikely data patterns across the two endline surveys whereby there are extreme discontinuities in the observation counts precisely at the thresholds used to discretize the outcome variables." (p. 3)
Read 24 tweets
Mar 28, 2024
Just to add on: It’s quite common in econ to just throw FE into a regression. Consider the two most common FE we use: unit and time.

For unit FE, you need the treatment variable to vary within the unit. If treatment is fixed for all units? This shouldn’t work.
For time FE, you need changes in treatment within the time dimension. In other words, if all units are treated at the exact same time, the treatment dummy will be perfectly collinear with the time FE.
The issue here is that (some) estimation commands will drop variables to make sure the matrix is full rank, i.e. you don’t have perfect collinearity between two (or more) columns/variables.

What does this mean in practice?
Read 6 tweets
Aug 22, 2023
Take a look at these maps (from Liu et al, 2021). There’s something we don’t often expect in the US given how free people can move across states: very sharp discontinuities at the border.

So what’s going on? Image
These estimates come from a statistical method called small area estimation (SAE). Suppose you have a survey in the US. Using only the survey, you may only be able to say something about states, given sample sizes.

But what if you want to say something about counties? SAE!
Imagine you have survey data from some areas and you have auxiliary data that is predictive of the variable you are interested in, y. In theory this auxiliary data can be anything! Lots of interesting work on that front. (Shameless plug to check out my work.)
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(