, 10 tweets, 4 min read
My Authors
Read all threads
Happy to announce that our paper, "Predicting the replicability of social science lab experiments" is published in PLOS One (journals.plos.org/plosone/articl…). We study if replications can be predicted by an algorithm. What do you think? Better or worse than experts? (answer below)
Replications are very costly. If we can accurately predict their outcome, we can focus scientific resources on replicating only certain papers. For example, a journal could use an algorithmic decision rule to decide which submissions need to be replicated before acceptance.
Going back to Meehl (1954) it has been shown, in domain after domain, that statistical rules can be more accurate than professional judgment. We collected statistics about 131 study-replication pairs trained a Random Forest algorithm to predict two measures of replication.
In a held-out validation set, the model predicting binary replication (effect significant in the same direction) has an accuracy of 70%, and relative effect size with a correlation coefficient of 0.38. Not too bad!
To make sure our results were not just overfitting, we preregistered algorithmic predictions before running the replications in Camerer et. al. (2018) and compare them to the forecasts of experts. Accuracy is about as good as the aggregated beliefs of the prediction market!
We study how study characteristics help predict replication. Fig 4. shows features, ranked by their relative importance for predicting binary replication. Statistical attributes like original p-value are important, but so is the inclusion of higher-level features like citations.
We see our paper as a proof of concept. Training and test samples are too small to confidently recommend the algorithm be applied in the wild. However, the field moves fast and there are already multiple new datasets that could be incorporated to improve accuracy further.
Especially the replication and prediction efforts within the DARPA SCORE project (@OSFramework, @ReplicationMkts) will provide the scale needed to properly validate our results.
Last, I would like to thank all the researchers who conducted the replications that we use to train our model. This paper completely relies on the hard work by researchers in the Reproducibility Project: Psychology as well as Many Labs 1 and 3.
And I should probably tag my coauthors :). Its been such a privilege working with you! @gidin @CFCamerer @e_forsell and everyone else who I dont think is on here.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Adam Altmejd

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!