Panteleimon (
Exercise psychologist; affect & emotion researcher. Tweets about the link b/w exercise and pleasure/displeasure, and the role of exercise in mental health.

Sep 29, 2020, 15 tweets

I am really surprised that, when I comment on the myriad of (mostly HIIT-related) studies involving tiny samples (e.g., 5-10 per group) and a large number of dependent variables and tests of probability, most people do not see what the problem is. doi.org/10.1038/nmeth.…

In short, the combination of these two factors essentially guarantees two things: (1) you will find something "significant," especially if you don't take any steps to address the inflation of alpha, & (2) whatever you find will likely be non-replicable -- in other words, a fluke.

I am afraid that the Kinesiology literature is becoming so inundated with the combination of small samples and long lists of dependent variables and probability tests (almost always at .05), that we have become desensitized to its devastating consequences.

Back in the early 1990s, Robert Schutz had *jokingly* proposed to "Absolute Bottom Line Rule" (ABLR), namely that the total number of tests of probability must not exceed the sample size. The ABLR is routinely violated in the contemporary literature. This is shocking!

On a more general note, I am getting the sense that the quality of statistical education for doctoral students is on the decline -- despite the replication crisis. It is as if Kinesiology has adopted an insular position, pretending that the replication crisis is irrelevant to us.

When I travel, I like to quiz grad students on their statistical training. I do the same sometimes on Twitter. Turns out that a sure way to generate no reaction is to post about statistics and measurement, even if it's by example.

The other day, I joked that, after reviewing physical activity assessment studies, the authors "found" that estimates of reliability were higher than estimates of validity. That's not a "finding," that's the nature of measurement. It could not have been otherwise. No reaction...

Likewise, I post about numerous studies that have blatant problems with multiplicity (i.e., inflation of alpha to horrific levels). Those who respond tend to focus on the fact that I critique (how dare I) but no one seems to be bothered by the routine abuse of alpha itself.

Shockingly, the journals also seem to be OK with this practice. Twenty participants, 50 dependent variables, 100 tests of probability at .05, published in top-tier journals, with press releases pretending the "results" were real. Where is this going? Is anyone paying attention?

The question then is: non-replicable results are usually due to 1 or both of 2 things: (a) intentional acts (fabrication or just statistical abuses) and/or (b) incompetence in methods, analysis, interpretation. Let's assume that (a) is not a factor. What are we doing about (b)?

To wrap it up: if you have 2 groups, with 8 participants per group, the achieved level of power to detect a small effect (d=0.2, which is reasonable for short interventions) at .05 is ...6.6% (should be 80% minimum). So, how come researchers find a bunch or "significant" results?

Obviously, the need to boost their power well above 6.6%. How do they achieve this? First, small samples! This entails very poor precision of the population values (huge CIs); the means are going to bounce around, all over the place. So...

Second, a long list of dependent variables (non pre-registered, of course), each analyzed at .05. Remember: If you do just 14 tests of probability at .05, the chance of a Type I error exceeds 50% (not 5%). Due to the small samples and their imprecision, something will "come out."

Meaning, that some comparison will yield a "significant" result -- by chance. This means that you've got a paper. Congratulations. It also means that what you're reporting is likely false. So, now you know. And you can make the decision: what kind of researcher do you want to be?

If you want to be "famous" (you want the New York Times to say your name), then you hold the power to achieve it. Remember: tiny samples, lots of dependent variables (the more, the merrier), all tested at .05. It's easy and it almost never fails. I have empowered you to succeed!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling