MA2, by Burnette and colleagues, says ‘under some contexts, for some students.’ doi.org/10.1037/bul000…
If you know me, you know where this is going.
The short answer: Skip MA1. Read MA2.
The answer isn’t “yes” or “no”. Examining the data in both MA1 and MA2 reveals that on average, the effect is close to zero. This is not a surprise. But for at-risk students, the average effect on academic achievement is moderate (~0.15SD).
Ok, but now for my thoughts. I want to begin by noting that I am a statistician and that I really do not care of GM works or not.
We focus on 4 best practices in large MAs. We contrast the two MAs, focusing on the different methodological choices they made. I will try to summarize.
1) MA should prominently (abstract, results, discussion) quantify the heterogeneity in effect sizes.
MA2 prominently reports that 95% of effects vary between -0.08 to 0.35.
MA1 focuses only on the average effect. No mention of the PI or heterogeneity anywhere.
2) Meta-analyses should include all the relevant within-study variation in effect sizes.
There is no need to average effect sizes to the study level. Doing so excludes an important source of variation.
MA2 includes all relevant effect sizes. MA1 does not.
Aside 1: These methodological choices matter.
MA1’s choices leads them to conclude that GM does not work, since “the effect” is d = 0.04. MA2’s choices led them to conclude that GM works for particular subgroups, for particular outcomes. These are very different conclusions.
3) MA should appropriately adjust for confounders, including study quality and publication bias.
‘Study quality’ must be measured, and the development of new measures requires validity to be established. The best measures are agreed upon by a field.
4) Meta-analyses should seek to explain heterogeneity using moderation analyses.
This requires more than one-variable at a time models. But also, just because p > .05 does not mean that there is no variation. You cannot prove that the effect is constant.
Aside 2: These methodological choices matter.
MA1 created their own measure of quality, using criteria that are out of sync with the field*. They then concluded that the literature was ‘low quality’ and focused their conclusions around a very narrow subset. MA2 did not.
* e.g. 1: MA1 identifies studies that randomized classes or schools as 'low quality'. In education, these are the dominant RCT design.
* e.g. 2: MA1's measure of FCOI is a ‘post-treatment’ variable. It measured *subsequent* success, after the study was published.
For those that are still reading: Be like MA2. If you’re going to conduct a large MA, with a lot of “heterogeneity in”, then it is your job to try to understand the “heterogeneity out.”
And please, can we stop with the “either/or”, “good/bad”, “yes/no” thinking about interventions? I’m tired of writing commentaries.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Buckle up friends, I’ve started writing commentaries about meta-analyses.
In this one, we argue that (1) the effect of nudges on average is small (d = 0.08), but (2) also very heterogeneous (+/- 1.0) across studies.
In contrast, the original article’s abstract says that “the effect” is d = 0.43. They conclude that this effect is essentially constant, noting that nudges “affect behavior relatively independently of contextual study characteristics...”
Lately I have spoken with several people interested in building systems to help policymakers and practitioners leverage research to improve their decision-making in education. This has made me think a lot about the difference between data and evidence. 1/17
Evidence is data+, meaning data plus analysis methods, research design, assumptions, uncertainty, theory, and expertise. Evidence is expensive, whereas data is not. And evidence is, by design, not available for every question whereas data may be. 2/17
Practitioners want to know which (version of an) intervention will work in their very particular context. But there are infinite versions of these interventions and infinite different contexts. 3/17