Read on Twitter

Katie Corker @katiecorker

, 12 tweets, 3 min read Read on Twitter

https://twitter.com/BrianNosek/status/1064550837173796864

https://twitter.com/BrianNosek/status/1064550837173796864

As a #ML2 author, very excited to see this paper finally out! The headline that 50% of effects replicate buries the lead a bit. For me, the major finding here concerns heterogeneity. What's heterogeneity, you ask? Gather round...

https://twitter.com/BrianNosek/status/1064550837173796864

Heterogeneity describes how much effects being replicated vary from lab to lab, *beyond* sampling error. Sampling error says we expect that by chance, some labs overestimate the true effect size, others underestimate. Heterogeneity says how much more we expect effects to vary.

If differences between labs are to explain differences in replicability, there MUST be sizable heterogeneity in effects. The magnitude of the between site differences limits how much variance heterogeneity can explain.

So how big are the estimated heterogeneities in #ML2? For that, we need to look at *tau* (second column from the left in the table above.

Tau is the standard deviation of effects *in the units of effect size.* So for an effect size of d=0.8 with a tau of 0.5, we expect 68% of replication effect sizes to be between d=0.3 and d=1.3 (after accounting for sampling error).

Look at the taus in Table 3. No, really, look at them! MOST OF THEM ARE ZERO! Only two of them are bigger than 0.10!

[Note: Don't be distracted by the statistically significant Q tests (columns 3-5) for the taus. Yes, there is "significant" heterogeneity for several of the effects, but the actual heterogeneity effect sizes (taus) are very small.]

What does this mean in plain terms? It means that for most of these effects, there is no room for *lab level* differences to explain variability in effect size. This includes culture to the extent that culture is measured at the level of the lab.

These tiny heterogeneities stand in stark contrast to estimated heterogeneities from meta-analyses. Meta-analytic heterogeneity can be larger because of methodological differences in studies (i.e., not all studies are direct reps), but...

Selective reporting/p-hacking can also affect meta-analytic heterogeneity in ways that are not straightforward. Bottom line: meta-analytic heterogeneities are not good proxies for true heterogeneity.

However, the #ML2 results should give us serious pause when attributing meta-analytic variability to between site differences. Multi-site registered reports let us estimate heterogeneity empirically with more accuracy.

tldr: Many Lab 2 results cast major doubts on the ability of *lab-level* hidden moderators to explain variability in experimental effect size estimates.

Like this thread? Get email updates or save it to PDF!

Subscribe to Katie Corker

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Katie Corker

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Did Thread Reader help you today?