Follow @NicoleBarbaro

12,399 views

Nicole Barbaro

Follow @NicoleBarbaro

, 31 tweets, 11 min read

My Authors

I feel like doing a brief live tweet reading thread of this paper: A Validity-Based Framework for Understanding Replication in Psychology

journals.sagepub.com/doi/abs/10.117…

1/

Juicy take on why replication attempts *really* fail: The phenomenon I studied was super complex and you probably didn't do something correctly.

2/

Their proposal is a(nother!) framework or "lens" to evaluate failed replications across 4 types of validity to identify the potential ways in which the replication study differed that could explain the failed replication.

3/

First up is statistical conclusion validity. In other words, was the original finding a fluke? Issues of power are central here, as sample size norms were notoriously low until recently when large N studies were advocated for. Avg power in psych is only 36% 4/

The authors agree that power is important, but it's important to remember "that statistical conclusion
validity is only 1of the 4 categories of explanations for non-replication." Thus, let's move on.

5/

@JkayFlake

@JkayFlake

This section seems a bit dismissive of the fact that a study of 50 is less useful than a study of 5000. And dismissive of the notion that theres a strong case for the role of QRPs (a la @JkayFlake) in the replication crisis. These are mentioned, but referred to as "beliefs" 6/

Next up is interval validity -- is there really a causal relation between the IV and DV?

WRT replication failures, either the original study effed up here or the replication study effed up here. Both are possible! BUT....

7/

...bc the replication attempt is different than the original, it's more likely that the replication study effed up. The next paragraphs discuss how online studies are less valid in this respect than OG lab studies.

So, the replicators more likely did something wrong, not me. 8/

Overall, this section has IMO flawed assumptions. Assuming that OG lab studies are inherently more internally valid than other means of data collection; and, that the original study is less likely to be affected by threats to internal validity than replications (why?)

9/

Next up, construct validity. A massive problem in psychology given the nature of the unobservable things we try to study. Let's see what they have to say here.

Hypothesis (haven't read past here yet): that older studies have better construct validity than newer studies?

10/

More balanced than I anticipated based on the previous two! Measurement and lack of psychometric testing is a big problem. Also they mention that using the same exact measure from an old study may not be culturally relevant anymore in a replication.

11/

And I think this issue has more attention than what they allude to. Measurement and construct validity issues have become a huge topic of conversation.

And I would argue that recent studies and scale development do a hell of a better job now that 20 years ago.

12/

The burden seems to be put on the replicators that THEY must do all the work to prove that the original under-reported, made-for-this-JPSP-study measure is valid. How many papers from 15 years ago report the level of info now required of papers and replications? 13/

Lastly, external validity -- how well do effects generalize across situations, contexts, or populations?

I would argue this is a huge point. If an effect only occurs in a niche sample, then what is the practical utility of the knowledge, generally speaking?

14/

Counter: Authors usually assume unless stated, that te effects will generalize. If they thought otherwise theyd explicitly state so. But they don't bc their verbal theory offers no predictions. But when a replication fails it's bc of external validty problems -- your problem. 15/

These are valid points, but at what point CAN these authors say that a replication failed bc the original effect was a fluke?

This is a sly way to just slide past all the high-powered, large scale replication efforts that show that "hidden moderators" and lab effects seem to play a minimal role in replication failures.

Also, "advocates for prominent role of replication in psych". okk.

17/

The core argument is that replication failures can plausibly be accounted for by the four validity criterion discussed.

The problem as I see it, however, is that the authors are failing to recognize the validity problems in the original studies(!!!)

All is well here..

18/

A big point they make is that most replication focus is on statistical conclusion validity = power, QRPs, etc.

But, if you don't have adequate power & transparent practices, what's even the point? It seems like stat conclusion validity is necessary for anything else.

19/

Here they try to show how might an unbalanced focus on statistical conclusion validity might be justified, but they definitely disagree.

20/

Finally, what are the implications of this approach? First, the asymmetry of focus on validity issues is a problem. First, it may make researchers unfocused on issues besides sample size and could actively worsen other aspects of validity.

21/

This hits on my point a few posts up. If you MUST reduce your sample size to an underpowered number to get the best measurement & replicate the original perfectly, what are the implications? Team science IMO shows it's possible to have high N and good studies. 22/

@OSFramework

@OSFramework

They suggest that the field needs to focus on making original research more replicable by focusing on these areas of validity.

..aren't we doing that? Is that not what @OSFramework and @improvingpsych are doing? It's not all just replications...

23/

Replicators need to do better to really understand the complex original studies. Big lab replication efforts undermine other forms of validity and aren't knowledgeable enough of the original work.

24/

The final word. 25/

Alright. Overall thoughts about this paper and its argument.

I appreciate and respect the authors strong stance here, and criticism of current directions in our field are great.

I agree with their primary thesis that more attention can be paid to other forms of validity.

26/

That said, I think that the authors have fatal flaws in their assumptions.

They make implicit & explicit statements that original work is more valid than replications. I disagree. It misses the point that older work is lacking in psychometrics, power, transparency, etc. 27/

Finally I disagree that the asymmetry problem is as bad as they claim. Without statistical power and transparent research practices, most work is a non-starter, mainly in non-clinical work. 28/

I also dislike that the burden is on the replicators throughout the paper, with little attention to the inherent flaws in older psychology work. This dovetails with their assumption that original work is generally better than replication work. 29/

Overall, it's an interesting paper with good points that merits discussion. Well done!

Thanks for reading. This was not as brief as I originally thought. Interested to hear others thoughts on this paper. Cheers.

/END

@threadreaderapp

@threadreaderapp

Unroll @threadreaderapp

Try unrolling a thread yourself!

More from @NicoleBarbaro see all

Embed code for your website

Did Thread Reader help you today?