Our prospective replication study released!

5 years: 16 novel discoveries get round-robin replication.

Preregistration, large samples, transparency of materials.

Replication effect sizes 97% the size of confirmatory tests!

psyarxiv.com/n2a9x

Lead: @JProtzko 1/
When teams made a new discovery, they submitted it to a prereg’d confirmatory test (orange).

Confirmatory tests subjected to 4 replications (Ns ~ 1500 each)

Original team wrote full methods section. Team conducted independent replications (green) and a self-replication (blue).
Based on confirmatory effect sizes and replication sample sizes, we’d expect 80% successful replications (p<.05). We observed 86%.

Exceeding possible replication rate based on power surely due to chance. But, outcome clearly indicates that high replicability is achievable
Another way to look at the data is comparing replication effect sizes (3 independent=blue; self=orange) to the confirmatory test effect size (0.0).

We observe little systematic variation, self & independent replications were very similar to confirmatory outcomes on average.
We also tested whether the discoveries were trivial or highly predictable, perhaps making them “easy” to replicate. We did not observe this. A survey of researchers showed high heterogeneity in predictions indicating that these novel findings did not have high prior-odds.
This was the work of four labs, the PI's @JonathanSchool6 (UCSB), Leif Nelson (Berkeley), Jon Krosnick (Stanford), and I (UVA), and many lab collaborators like @CharlieEbersole @jordanaxt @NickButtrick in mine.
This isn’t a randomized trial adopting high rigor versus not. We designed the study to evaluate decline effects, but we did not observe them. To us, the most plausible reason we didn’t observe decline is due to some combination of the rigor enhancing methods we adopted.
This is an existence proof that high replicability is achievable in social-behavioral sciences with novel discoveries. Next step is to unpack the causal role of methodological interventions on improving replicability and explore generalizability across topics and methodologies.
Here is one more visualization of the data showing lack of systematic decline from confirmation study (0) through the replication studies (1-4) for each of the 16 novel findings coming from the labs over the last 5 years.

Congrats to @JProtzko & team for completing this beast!
If you prefer to receive your papers in video form, here's a presentation by @JonathanSchool6 of the preliminary results at last year's Metascience 2019 meeting. metascience.com/events/metasci…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Brian Nosek

Brian Nosek Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BrianNosek

25 May 19
Happy to elaborate. Think of preregistration of analysis plans as hypothesizing, data analysis, and scenario planning all rolled into one and without knowing what the data are. This creates a novel decision-making situation. 1/
For example, the first time preregistering an analysis plan, many people report being shocked at how hard it is without seeing the data. It produces a recognition that our analysis decision-making (and hypothesizing) had been much more data contingent than we realized. 2/
Without the data, there is a lot of new mental work to articulate precisely what the hypothesis is and how the data could be used to evaluate that hypothesis. My odd experience was believing that I had been doing that all along, w/out realizing that I used so much discretion. 3/
Read 12 tweets
8 Jan 19
Some predictions about whether the researcher's ideology effects their likelihood of replicating a prior result. ht @jayvanbavel

First, I have no doubt that ideology CAN influence replicability. Classic Rosenthal work + more provides good basis.

So, under what conditions?
1. Ideology may guide selection of studies to replicate. More likely to pursue implausible X because it disagrees with my priors; and pursue plausible Y because it agrees with my priors.

On balance, this may be a benefit of ideology to help with self-correction and bolstering.
2. Ideology may shape design of studies. More likely to select design conditions to fail if I don't like the idea; more likely to select design to succeed if I like the idea.

This is a problem because of tendency for overgeneralization of limited conditions to phenomenon. But,
Read 8 tweets
19 Nov 18
Many Labs 2: 28 findings, 60+ samples, ~7000 participants each study, 186 authors, 36 nations.

Successfully replicated 14 of 28 psyarxiv.com/9654g

ML2 may be more important than Reproducibility Project: Psychology. Here’s why...

@michevianello @fredhasselman @raklein3
ML2 minimized boring reasons for failure. First, using original materials & Registered Reports cos.io/rr all 28 replications met expert reviewed quality control standards. Failure to replicate not easily dismissed as replication incompetence. psyarxiv.com/9654g
Second, the total ML2 replication median sample size (n = 7157) was 64x original median sample size (n = 112). If there was an effect to detect, even a much smaller one, we would detect it. Ultimate estimates have very high precision. psyarxiv.com/9654g
Read 15 tweets
27 Aug 18
We replicated 21 social science experiments in Science or Nature. We succeeded with 13. Replication effect sizes were half of originals. All materials, data, code, & reports: osf.io/pfdyw/, preprint socarxiv.org/4hmb6/, Nature Human Behavior nature.com/articles/s4156…
Using prediction markets we found that researchers were very accurate in predicting which studies would replicate and which would not. (blue=successful replications; yellow=failed replications; x-axis=market closing price) socarxiv.org/4hmb6/ nature.com/articles/s4156… #SSRP
Design ensured 90% power to detect an effect size half as large as original study. Replications averaged 5x the sample size of originals. We obtained original materials in all but one case, and original authors provided very helpful feedback on design. socarxiv.org/4hmb6/
Read 22 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!