Profile picture
Brian Nosek @BrianNosek
, 15 tweets, 10 min read Read on Twitter
Many Labs 2: 28 findings, 60+ samples, ~7000 participants each study, 186 authors, 36 nations.

Successfully replicated 14 of 28 psyarxiv.com/9654g

ML2 may be more important than Reproducibility Project: Psychology. Here’s why...

@michevianello @fredhasselman @raklein3
ML2 minimized boring reasons for failure. First, using original materials & Registered Reports cos.io/rr all 28 replications met expert reviewed quality control standards. Failure to replicate not easily dismissed as replication incompetence. psyarxiv.com/9654g
Second, the total ML2 replication median sample size (n = 7157) was 64x original median sample size (n = 112). If there was an effect to detect, even a much smaller one, we would detect it. Ultimate estimates have very high precision. psyarxiv.com/9654g
Third, each original finding was replicated in >60 samples with labs from 36 nations and territories taking part. If the effect was easy to detect in some samples and not others, ML2 would find evidence for that. psyarxiv.com/9654g
Fourth, some original authors offered a priori hypotheses of moderating influences based on sample, task order, or other design features. Identifying them a priori provided an opportunity to test them with confidence.
The top line results of 50% overall replication success rate and effect sizes less than 1/2 of original studies are consistent with prior replication studies. psyarxiv.com/9654g
In sum, in ML2, the failures to replicate are not due to failure to meet expert review expectations, low power, heterogeneity of effects, hypothesized moderators, or task order. Example, Table 4 & Figure 3 show variation in effect size based on task order. psyarxiv.com/9654g
The main purpose of ML2 was to examine heterogeneity across sample & setting. Some heterogeneity was observed. It was mostly in large effects, not in weak effects. The notion that some “fragile” effects are highly sensitive to sample had no support here. psyarxiv.com/9654g
Heterogeneity was modest overall. For the most part, replicable effects replicated—some with varying strength. Unreplicable effects didn’t replicate, no matter where they were studied. Weaker effects replicated at ~ the rate expected for sample power. psyarxiv.com/9654g
We also explored whether effects varied substantially between WEIRD and less WEIRD cultures. A couple of cases showed meaningful differences, but most did not.

I find this Figure S2 to be particularly stunning. psyarxiv.com/9654g
We do not know generalizability of these findings. But, they provide a notable challenge to the “hidden moderators” hypothesis. Multiple moderators, plenty of power, little support. Not compelling to assert moderator explanation w/out testing it directly. psyarxiv.com/9654g
Also, we replicated evidence that surveys & markets can anticipate replication success. There are now 4 studies showing evidence for this.

ML2: psyarxiv.com/9654g
RPP: pnas.org/content/112/50…
EERP: science.sciencemag.org/content/351/62…
SSRP: nature.com/articles/s4156…
Congratulations to project leads @raklein3 @michevianello @fredhasselman for completing this massive effort & to 180+ collaborators that were essential for project success. The work was comparable to 29 papers. The long-term impact will make it worth it. psyarxiv.com/9654g
Finally, all data, materials, and code are available on OSF for review and reuse osf.io/8cd4r

Test moderators. Do exploratory analysis on a subset, then apply model to holdout sample to maximize diagnosticity of stat inferences. psyarxiv.com/9654g
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Brian Nosek
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!