1⃣ Exploratory hypothesis tests - unplanned tests of post hoc hypotheses that may be based on the current results, and
2⃣ Confirmatory hypothesis tests - planned tests of a priori hypotheses that are independent from the current results
This distinction is supposed to be useful because exploratory results are assumed to be more “tentative” and “open to bias” than confirmatory results.
We challenge this assumption and argue that exploratory results can be more compelling than confirmatory results.
In the first part of our article, we demonstrate that the same data can be used to generate and test a hypothesis in a transparently valid manner.
We agree that circular reasoning can invalidate some exploratory hypothesis tests. However, circular reasoning can be identified by checking the *contents* of the reasoning without knowing the *timing* of that reasoning (i.e., a priori or post hoc).
In the second part of our article, we argue that exploratory hypothesis tests can have several evidential *advantages* over confirmatory tests and, consequently, they have the potential to deliver more compelling research conclusions. In particular,...
Exploratory hypothesis tests:
✅ avoid researcher commitment and prophecy biases
✅ reduce motive for data fraud
✅ are more appropriate following unplanned deviations
✅ facilitate inference to the best explanation
✅ allow peer reviewers to contribute to exploratory analyses
Finally, in the third part of our article, we consider several potential *disadvantages* of exploratory hypothesis tests and conclude that these potential disadvantages may not be problematic. In particular,…
Exploratory hypotheses tests are not necessarily disadvantaged due to:
✅overfitting
✅bias
✅HARKing
✅unacceptable research practices
And they:
✅are usually necessary
✅can be falsified
✅can predict anything but may suffer an evaluative cost in doing so
To be clear, our claim is not that exploratory hypothesis tests are *always* more compelling than confirmatory tests or even that they are *typically* more compelling. Our claim is only that exploratory tests *can be* more compelling in specific research situations.
More generally, we encourage researchers to evaluate specific tests and results on a case-by-case basis rather than to follow simplistic heuristics such as “exploratory results are more tentative,” which represents a form of methodolatory:
Finally, we’d like to thank our peer reviewers and the editorial team at Philosophical Psychology @JournalPHP for providing an efficient and constructive review process that greatly helped to improve the quality of our article. 🙏
Rubin, M., & Donkin, C. (2022). Exploratory hypothesis tests can be more compelling than confirmatory hypothesis tests. Philosophical Psychology. doi.org/10.1080/095150…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Here’s a (non-exhaustive) list of 10 (potentially compatible?) explanations for the replication crisis.
THREAD >>>
(1) Low statistical standards of evidence
“Statistical standards of evidence for claiming new discoveries in many fields of science are simply too low” (Benjamin et al., 2018). doi.org/10.1038/s41562…
(2) Low statistical power
“So-called failures to replicate may not be failures at all, but rather are the result of low statistical power in single replication studies” (Maxwell, 2015). psycnet.apa.org/record/2015-39…
Some potentially useful recent papers on stats and methodology...
André, Q. (2021). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General. doi.org/10.1037/xge000…
Brysbaert, M. (2019) How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 1–38. doi.org/10.5334/joc.72
I argue that an alpha adjustment is *not* necessary when undertaking a single test of an individual hypothesis, even when many such tests are conducted within the same study.
So, in the example below, it’s perfectly acceptable to claim “green jelly beans linked to acne” using an unadjusted alpha level of .05 given that this claim is based on a *single test* of green jelly beans rather than *multiple tests* of green jelly beans.
Preregistration helps to distinguish planned "confirmatory" tests of a priori hypotheses from unplanned "exploratory" tests of post hoc hypotheses. However, some people argue that this distinction doesn't really matter.
Let’s talk about that!
A THREAD…
Some people argue that the *type* of hypothesis generation (deductive vs. inductive) is more important than the *timing* of hypothesis generation (a priori vs post hoc; e.g., Worrall, 1985, 2010, 2014).
In particular, we can distinguish between: (a) a deduction from pre-existing theory and evidence (“prediction”) and (b) an induction from the current research results (“accommodation”).
HARKing stands for Hypothesizing After the Results are Known. It occurs when researchers present their post hoc hypotheses as if they are a priori hypotheses. Kerr (1998) wrote the seminal article on HARKing… doi.org/10.1207/s15327…
HARKing has been described as one of the four horsemen of the replication apocalypse (Bishop, 2019). But is it really that bad?
In my view (Rubin, 2017, 2019), HARKing is often seen as problematic because it is bundled together with other "questionable research practices,” such as p-hacking. But what happens if we consider HARKing on its own, separate from these other issues?
There’s been some nice work recently that has taken a critical look at the theory and practice of preregistration. Here’s what I think are some key articles. (Plus my own wee contribution!)
(Thread, 1/9)
Devezer et al. (2020). The case for formal methodology in scientific reform.