Read on Twitter

Brian Nosek @BrianNosek

, 15 tweets, 10 min read Read on Twitter

@michevianello

@michevianello

Many Labs 2: 28 findings, 60+ samples, ~7000 participants each study, 186 authors, 36 nations.

Successfully replicated 14 of 28 psyarxiv.com/9654g

ML2 may be more important than Reproducibility Project: Psychology. Here’s why...

@michevianello @fredhasselman @raklein3

ML2 minimized boring reasons for failure. First, using original materials & Registered Reports cos.io/rr all 28 replications met expert reviewed quality control standards. Failure to replicate not easily dismissed as replication incompetence. psyarxiv.com/9654g

Second, the total ML2 replication median sample size (n = 7157) was 64x original median sample size (n = 112). If there was an effect to detect, even a much smaller one, we would detect it. Ultimate estimates have very high precision. psyarxiv.com/9654g

Third, each original finding was replicated in >60 samples with labs from 36 nations and territories taking part. If the effect was easy to detect in some samples and not others, ML2 would find evidence for that. psyarxiv.com/9654g

Fourth, some original authors offered a priori hypotheses of moderating influences based on sample, task order, or other design features. Identifying them a priori provided an opportunity to test them with confidence.

The top line results of 50% overall replication success rate and effect sizes less than 1/2 of original studies are consistent with prior replication studies. psyarxiv.com/9654g

Across 6 large-scale replication projects, replication rate is 90 of 190 (47%).

ML1: econtent.hogrefe.com/doi/full/10.10…
ML2: psyarxiv.com/9654g
ML3: sciencedirect.com/science/articl…
SSRP: nature.com/articles/s4156…
EERP: science.sciencemag.org/content/351/62…
RPP: science.sciencemag.org/content/349/62…

In sum, in ML2, the failures to replicate are not due to failure to meet expert review expectations, low power, heterogeneity of effects, hypothesized moderators, or task order. Example, Table 4 & Figure 3 show variation in effect size based on task order. psyarxiv.com/9654g

The main purpose of ML2 was to examine heterogeneity across sample & setting. Some heterogeneity was observed. It was mostly in large effects, not in weak effects. The notion that some “fragile” effects are highly sensitive to sample had no support here. psyarxiv.com/9654g

Heterogeneity was modest overall. For the most part, replicable effects replicated—some with varying strength. Unreplicable effects didn’t replicate, no matter where they were studied. Weaker effects replicated at ~ the rate expected for sample power. psyarxiv.com/9654g

We also explored whether effects varied substantially between WEIRD and less WEIRD cultures. A couple of cases showed meaningful differences, but most did not.

I find this Figure S2 to be particularly stunning. psyarxiv.com/9654g

We do not know generalizability of these findings. But, they provide a notable challenge to the “hidden moderators” hypothesis. Multiple moderators, plenty of power, little support. Not compelling to assert moderator explanation w/out testing it directly. psyarxiv.com/9654g

Also, we replicated evidence that surveys & markets can anticipate replication success. There are now 4 studies showing evidence for this.

ML2: psyarxiv.com/9654g
RPP: pnas.org/content/112/50…
EERP: science.sciencemag.org/content/351/62…
SSRP: nature.com/articles/s4156…

@raklein3

@raklein3

Congratulations to project leads @raklein3 @michevianello @fredhasselman for completing this massive effort & to 180+ collaborators that were essential for project success. The work was comparable to 29 papers. The long-term impact will make it worth it. psyarxiv.com/9654g

Finally, all data, materials, and code are available on OSF for review and reuse osf.io/8cd4r

Test moderators. Do exploratory analysis on a subset, then apply model to holdout sample to maximize diagnosticity of stat inferences. psyarxiv.com/9654g

Like this thread? Get email updates or save it to PDF!

Subscribe to Brian Nosek

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Brian Nosek

This content may be removed anytime!

Try unrolling a thread yourself!

More from @BrianNosek see all

Related threads

Trending hashtags

Did Thread Reader help you today?