My Authors
Read all threads
Scientists are rewarded for novelty and worry about being “scooped” by competitors. Can such competition harm the scientific process by causing rushed research? Can “scoop protection” help? Very excited to share a new model (osf.io/preprints/meta…) in which we tackle these q's.
We wanted to test the hypothesis that incentivizing priority of discovery (i.e., larger rewards for novel findings) harms scientific reliability by causing scientists to “rush” their work. To do so, we built an evolutionary agent-based model of competition for priority in science
In our model, scientists acquire data on problems and are rewarded based on their priority of publication. Scientists can increase statistical power by increasing sample size, but larger samples cost more time, which leaves scientists vulnerable to being scooped by competitors.
We simulated a population of scientists, each of whom is characterized by two parameters: the sample size of their research studies, and their probability of abandoning a research question when another scientist publishes on that question.
We assumed that scientists could transmit their methods to trainees, so the distributions of these parameters could evolve across generations. Other models (royalsocietypublishing.org/doi/full/10.10…, sciencedirect.com/science/articl…) by @psmaldino and @cailinmeister have used a similar approach.
@psmaldino @cailinmeister We were particularly interested in changing the intensity of competition (e.g., more competitors, larger rewards for novel results) affected the equilibrium sample sizes that populations evolved to.
For example, when rewards for novelty are high, what types of research strategy “wins out”? Who gets the highest payoff? Scientists who conduct large, time-intensive studies? Or scientists who conduct many small, underpowered studies?
We were also interested in the consequences of individual scientists’ behaviors for the efficiency and reliability of science. For example, what happens to PPV (the proportion of positive results that correspond to a true effect)?
Or what happens to scientists’ beliefs about the epistemic status of effects? Do scientists necessarily converge upon “true” beliefs about whether an effect exists or not?
Here I'll summarize some of the main results for both individual-level and population-level outcomes. I'll then give my take on why I think these are interesting, as well as discuss some limitations of our model
KEY RESULTS.

The below figure summarizes a few important results.
1)More competitors causes the cultural evolution of smaller sample sizes. This occurs because more competition = larger probability that a given scientist will be scooped, which favors scientists who quickly conduct small, underpowered studies.
2)Larger rewards larger rewards for “scooped” publications promotes larger sample sizes. Larger benefits to publishing scooped results allow scientists who are most likely to get scooped (i.e., those with larger sample sizes) to receive larger payoffs.
This reduces the relative payoff difference between scooped scientists and those who are fastest to finish sampling (i.e., those with smaller sample sizes).
3) Larger benefits to negative results promotes smaller sample sizes. This occurs because scientists have little incentive to conduct large studies - conducting a small, underpowered study usually produces a negative result, but this result is worth just as much...
as results from a larger, well-powered study that also finds a negative result. However, conducting many small studies produces results at a higher rate than conducting fewer large studies. This favors scientists who conduct studies with smaller sample sizes.
We were also interested in potential solutions. That is, which scientific reforms promote the evolution of larger equilibrium sample sizes (i.e., higher-quality research)?
Our model demonstrates two potential solutions.
First, “scoop protection” promotes the evolution of larger equilibrium sample sizes (See above figure). This result provides theoretical support for the logical coherence of “scoop protection” reforms at @eLife and @PLOSBiology are logically coherent.
See journals.plos.org/plosbiology/ar… and elifesciences.org/articles/30076 for these journals' recent descriptions of their "scoop protection" policies
We find that allowing “scooped" scientists to receive some payoff reduces the incentive to run small-sample size studies in order to increase the probability of being first to publish a result.
Second, we find that larger startup costs promote larger sample sizes.
Large startup costs disincentivize scientists from pursuing such a “quantity” strategy of running many small studies in the hopes of getting at least 1 statistically significant result.
This occurs because scientists pay a time cost every time they start (or restart) a study, and scientists who attempt to run many small studies pay such a cost more often than scientists who run fewer larger studies.
This finding suggests that startup costs are one solution to the problem of scientific unreliability. Coincidentally, some current reforms have inadvertently introduced such costs.
For example, preregistration and registered reports make researchers spend more time thinking about and designing protocols before running investigations.

cos.io/prereg/

cos.io/rr/
The time cost inherent in these practices is often conceptualized as an inconvenience. However, our model implies that such costs have an important function: they
incentivize scientists to conduct higher-quality research than they would otherwise*
*but see below for caveats of slowing down scientists
POPULATION LEVEL OUTCOMES
The population-level outcomes that we investigated suggested that the factors that promote larger sample sizes were also beneficial for science as a whole (But again, see below for major caveats).
PPV (the proportion of positive results that correspond to true effects)
Proportion of questions with more true results
Larger equilibrium sample sizes mean a larger PPV and a higher proportion of questions with more true results.
We also looked at the proportion of time that scientists spent collecting data that eventually went on to be published, as opposed to being file-drawered when scientists abandon research problems.
We found that large costs to being scooped caused scientists to be more likely to abandon problems upon being scooped, reduced population efficiency.
Basically, when the costs to being scooped are high, scientists quickly abandon research on a problem once they are scooped, and their research up to that point is wasted.
We also computed several other population-level outcomes, which we describe in the supplementary materials (osf.io/cbftz/)
One big caveat here is that the same factors that promote larger sample sizes (e.g, scoop protection, startup costs) cause scientists to investigate fewer total questions. So, there’s a quality-quantity tradeoff here, and our model is not well made to assess the optimal level...
of competition or scoop protection given the tradeoff between reliability and productivity.
AN INTERESTING UNEXPECTED FINDING
One population-level outcome that we assessed was scientists’ average per-study change in belief. We assumed that scientists used Bayesian updating to update their beliefs about whether a true effect existed or not, based on the published literature.
However, we assumed that scientists only know the average statistical power of their studies. We think this is reasonably – if scientists knew the exact statistical power of their study, they would need perfect information about each effect size, which is both unrealistic and...
...would mean that conducting a research study is unnecessary in the first place.
Will such Bayesian scientists inevitably acquire accurate beliefs about whether each research question has a true or null effect?
Interestingly, we found that there are some conditions when the average study shifts scientists’ beliefs in the “wrong” direction. Specifically, when effect sizes are small, scientists come to believe that there is no true effect, even though a true effect exists.
This occurs because, when effect sizes are small, scientists overestimate their statistical power and their beliefs are more strongly influenced by the large number of false-negative results than they would be if scientists had perfect information about their statistical power.
So, our model demonstrates yet another mechanism by which science fails to self-correct, adding to the growing number of studies on this topic (see the paragraph attached).
LIMITATIONS
As with any model, there is always a garbage-in garbage-out problem. If the assumptions of the model don’t adequately capture key features of the real-world problem, then it’s not clear how much insight the model provides into the problem.
(Biased perspective here). I think that our model does a good job of capturing some key features of real-world priority races (e.g., quality-quantity tradeoffs; variable numbers of competitors; biases against negative results) and of reforms that we purportedly model
(e.g., scoop protection as being about increasing the rewards to scientists who aren’t first to publish).
That said, we make several assumptions that are unrealistic, and it’s not yet clear how our qualitative results depend on these.
For example, we find that rewarding negative results harms science. However, our model assumes that the value of a result does not depend on the quality of the conducted study (i.e., sample size).
This assumption often doesn’t correspond to the real world, where high-quality negative results (e.g., large sample size direct replications that find a null result) are probably more likely to be published, and are cited more, than an underpowered study that finds a null.
Even though our assumption is unrealistic, it is important that we understand the logical consequences of a system in which payoffs are independent of study quality. If negative results are rewarded regardless of quality, there is no incentive to conduct high-quality research.
Finally, some highly-cited existing models, such as the Natural Selection of Bad Science, also make this assumption, and honestly, I don’t think that it is a good one. It basically assumes that all that matters is that you publish, and whether your result is positive or negative.
It ignores things like the incentives to do work that acquires large numbers of citations.
So, I think that future models should relax this assumption. Something to work towards as we strive to build a theoretical foundation for understanding how incentives affect scientists’ behavior and science as a whole.
Our model makes several other assumptions that I’d like to highlight. We assume that scientists can only publish once on a given effect before moving on to a new question. And we assume that all research questions are independent, and their solutions are equally valuable...
whereas real-world science consists of interconnected questions with possibilities for intermediate solutions before solving big problems.
for example, see arxiv.org/abs/1605.05822
So, our model does a better job of capturing a situation in which scientists work in well-defined problems with a-priori correct answer, and doesn’t do a good job of capturing how scientists choose scientific problems, or the creative component of science.
A FINDING WE DON'T YET UNDERSTAND
In our model, we arbitrarily fixed population size at 120, assuming that it wouldn’t affect the qualitative results. When getting feedback from fellow scientists, Karthik Panchanathan pointed out that this population size might be small enough that our qualitative findings would
be affected by genetic drift (i.e., the larger effects of randomness in smaller populations).
He pointed to a case in ev. biology where it turned out that the qualitative patterns found in a simulation of indirect reciprocity were substantially affected by the high-levels of genetic drift in the original simulation:

nature.com/articles/31225
royalsocietypublishing.org/doi/abs/10.109…
As a check, we ran our simulation with a pop size of 1000, for a subset of parameters. The sample size results were robust to this check. However, for a small subset of the results regarding scientists’ probability of abandonment, we found a qualitatively different pattern.
For example, compare the probability of abandonment when population size = 120 (first image) to when population size = 1000 (second image). The difference occurs when there are two competitors.
In the popsize = 120 case, populations tend to evolve to abandon problems when decay =10, even when there are only two competitors. But in the popsize = 1000 case, populations evolve towards never abandoning, no matter what.
We found a similar pattern when startup costs were larger (400). In the popsize = 120 case, when the costs to being scooped were large, populations always evolve towards abandoning upon being scooped if there are many competitors.
In the popsize =1000 case, some populations evolve towards never abandoning, while others drift around more intermediate values.
We could not figure out the cause of this pattern, but would be interested to hear your thoughts about what might be generating it.
CONCLUSION
Scoop protection reforms are logically coherent, as are arguments that competition for priority can harm the efficiency and reliability of science. Registered reports increase startup costs, and our model suggests that this will increase scientific reliability.
However, reliability comes at the cost of productivity (at least, in a world where getting one problem wrong doesn’t make the next problem harder to solve) so we need to determine what the optimal tradeoff is between these things.
My collaborators (Thomas Morgan and Minhua Yan at Arizona State University) and I have been working on this model for a few years now, so I’m thrilled to finally be able to get this out into the world and hear your feedback.
Finally, although we are a long cry from having a deep understanding of how incentives affect the scientific process, we will make progress much quicker if we move from verbal arguments improving science to more formal modeling.
This isn’t a new point and has been on many people’s radar recently, so I don’t have much to add.
However, the fact that many people in the social sciences are now taking this seriously, and the fact that within the metascience community, there are already quite a few models of the scientific process, makes me optimistic that our field is on its way towards developing a solid
theoretical foundation for understanding the scientific process and how to make it better.
All code and simulated data here: osf.io/cbftz/
Tagging some folks who may be interested in the paper:
@wfrankenhuis1 @EikoFried @IrisVanRooij @zerdeve @STWorg @rlmcelreath @CT_Bergstrom @jevinwest @KevinZollman @RemcoHeesen @mmuthukrishna @siminevazire. I'm sure there are many others who I'm blanking on - sorry in advance.
Finally, thanks to everyone who provided feedback on an earlier form of this paper in one form or another. Dan Hruschka, @NicoleWalasek, Karthik Panchanathan, @psmaldino, @wfrankenhuis1, Rob Boyd, @CT_Bergstrom , @lakens , @peder_isager, @annemscheel, Jesse Fenneman.
and again, I'm sure there are others, and I apologize if I forgot to mention you, but thanks for your input. All have helped to substantially improve the paper.
Nap time. End thread.
Addendum 1: The image for the effect of startup costs didn't display the axis. The below pair of images make clearer what the colors mean:
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Leonid Tiokhin

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!