I am really surprised that, when I comment on the myriad of (mostly HIIT-related) studies involving tiny samples (e.g., 5-10 per group) and a large number of dependent variables and tests of probability, most people do not see what the problem is. doi.org/10.1038/nmeth.…
In short, the combination of these two factors essentially guarantees two things: (1) you will find something "significant," especially if you don't take any steps to address the inflation of alpha, & (2) whatever you find will likely be non-replicable -- in other words, a fluke.
I am afraid that the Kinesiology literature is becoming so inundated with the combination of small samples and long lists of dependent variables and probability tests (almost always at .05), that we have become desensitized to its devastating consequences.
Back in the early 1990s, Robert Schutz had *jokingly* proposed to "Absolute Bottom Line Rule" (ABLR), namely that the total number of tests of probability must not exceed the sample size. The ABLR is routinely violated in the contemporary literature. This is shocking!
On a more general note, I am getting the sense that the quality of statistical education for doctoral students is on the decline -- despite the replication crisis. It is as if Kinesiology has adopted an insular position, pretending that the replication crisis is irrelevant to us.
When I travel, I like to quiz grad students on their statistical training. I do the same sometimes on Twitter. Turns out that a sure way to generate no reaction is to post about statistics and measurement, even if it's by example.
The other day, I joked that, after reviewing physical activity assessment studies, the authors "found" that estimates of reliability were higher than estimates of validity. That's not a "finding," that's the nature of measurement. It could not have been otherwise. No reaction...
Likewise, I post about numerous studies that have blatant problems with multiplicity (i.e., inflation of alpha to horrific levels). Those who respond tend to focus on the fact that I critique (how dare I) but no one seems to be bothered by the routine abuse of alpha itself.
Shockingly, the journals also seem to be OK with this practice. Twenty participants, 50 dependent variables, 100 tests of probability at .05, published in top-tier journals, with press releases pretending the "results" were real. Where is this going? Is anyone paying attention?
The question then is: non-replicable results are usually due to 1 or both of 2 things: (a) intentional acts (fabrication or just statistical abuses) and/or (b) incompetence in methods, analysis, interpretation. Let's assume that (a) is not a factor. What are we doing about (b)?
To wrap it up: if you have 2 groups, with 8 participants per group, the achieved level of power to detect a small effect (d=0.2, which is reasonable for short interventions) at .05 is ...6.6% (should be 80% minimum). So, how come researchers find a bunch or "significant" results?
Obviously, the need to boost their power well above 6.6%. How do they achieve this? First, small samples! This entails very poor precision of the population values (huge CIs); the means are going to bounce around, all over the place. So...
Second, a long list of dependent variables (non pre-registered, of course), each analyzed at .05. Remember: If you do just 14 tests of probability at .05, the chance of a Type I error exceeds 50% (not 5%). Due to the small samples and their imprecision, something will "come out."
Meaning, that some comparison will yield a "significant" result -- by chance. This means that you've got a paper. Congratulations. It also means that what you're reporting is likely false. So, now you know. And you can make the decision: what kind of researcher do you want to be?
If you want to be "famous" (you want the New York Times to say your name), then you hold the power to achieve it. Remember: tiny samples, lots of dependent variables (the more, the merrier), all tested at .05. It's easy and it almost never fails. I have empowered you to succeed!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Kahneman and exercise science? What is the relevance of the scientific legacy of the great Israeli psychologist and Nobel laureate to the science of physical activity? It's much more than you think. A thread -- and a tribute...
Let's start from this. Imagine that you bring together the world's best physical activity epidemiologists, experts in physical activity assessment, exercise physiologists, and sports medicine physicians. You put them in a room at the @WHO headquarters and ask them to develop the next physical activity guidelines. What are they going to come up with?
In the absence of input from the behavioral sciences, the team will likely follow what I call the "common sense approach" to developing physical activity guidelines. For example, analyze the @WHO guidelines. The part at the top gives the rationale for physical activity.
In November 2020, my students and I discovered a completely fake meta-analysis, now cited more than 100 times. I notified @Hindawi but, as shown below, they have no intention to act. Each year, on the anniversary of the discovery, I re-post this thread...
This is the meta-analysis in question, supposedly summarizing RCTs examining the effects of exercise in patients with chronic renal failure. Note that the APC for "BioMed Research International" is $2,550.
Also note that @WileyGlobal bought @Hindawi in 2021 for $298 million, evidently unbothered by the fact that @Hindawi is generally not considered a reputable scientific publisher.
Steve Blair, an iconic figure in the field of exercise science over the past four decades, has passed away at the age of 84. He is leaving behind an enormous legacy. I would like to share a few thoughts...
In my mind, Blair was the last of the trio of physical activity epidemiologists that gave our field a prominent place in contemporary medicine and public health. We lost Jerry Morris on Oct 28, 2009. We lost Ralph Paffenbarger on Jul 9, 2007. We lost Steve Blair on Oct 6, 2023.
Parenthetically, if you are interested, the @ACSMNews has a wonderful 22-minute video at the YouTube link below, featuring a conversation with Blair and Paffenbarger. Highly recommended.
This sort of headlines (what you thought you knew is actually false) are becoming increasingly common. While we can debate their scientific value, one thing is for sure: they are *wonderful* training opportunities for Kinesiology / Exercise Science students. Let's take a look...
The first thing to note is that these headlines are happening against the backdrop of tremendous activity in the dementia field following the flop of Aducanumab (Aduhelm). Now, there is lecanemab, also a monoclonal antibody, with similar side-effects (brain swelling, bleeding).
So, let's look at the study in question. The MEDEX (Mindfulness, Education, and Exercise) randomized controlled trial ($3M) aimed to compare mindfulness-based stress reduction and exercise, alone or in combination, with a control intervention (health ed).
Today is the first formal step toward the culmination of a 10-year process of trying to analyze and comprehend the phenomenon of HIIT within exercise science. Paper II (from a set of 6) with @NBTiller is the first to become available online (DM for PDF).
In this paper, @NBTiller and I address the increasing prevalence of "spin" by examining 4 extraordinary claims that appeared in the HIIT research literature and subsequently made a splash as media headlines. We dissect the underlying research used as the basis for these claims.
What we find is a narrative that has run amok, becoming disconnected from the data; blatant neglect of basic methodological and statistical principles; serious errors of reporting; a striking absence of critical appraisal by journals, university press offices, and the mass media.
When you read that power calculations determined that a sample size of "8 per group" sufficed to provide 80% power, do you get a queasy feeling in your stomach? Like something ain't right? And does the paper start to smell fishy all of a sudden? Don't you get the urge to verify?
So, your stomach would be correct. Let's set aside for a moment that expecting 50% superiority from an 8-week intervention is kind-of ludicrous. Since 50% of 15 is 7.5, comparing 15±5 to 22.5±5 gives d=0.61, which requires 43 per group (not 8) to reach 80% power.
Then, you read that VO2peak changed from 22.6±8.2 to 24.7±7.9 (+2.1 ml) in one group and from 23.2±5.4 to 26.7±5.8 (+3.5 ml) in the other but "improvements in CRF" were "larger" in the latter group (with N=10 per group). Don't you get a strange feeling that those means are close?