TOPIC: P-Values In Table 1 of RCT's. Time to revisit this poll.
Thanks very much to the clinicians that responded. This came out better than expected, albeit the selection bias of “clinicians that follow statisticians on Twitter” suggests that the respondents are collectively better versed in data analysis than general research population
Anyways, putting p-values in Table 1 of RCT’s is an inappropriate use of significance testing, yet remains prevalent in medical literature, because it SEEMS to make so much sense (at least, the way most people have been taught p-values and statistical significance…)
There are two separate problems here:
(1): the mistaken belief that perfect “baseline balance” is necessary for a treatment comparison to be valid, and that any deviations from such balance (presumably, as shown by p<0.05’s in the baseline table) undermine the trial’s primary comparison.
(2): the mistaken belief that a p-value offers meaningful information which helps assess danger to problem (1)
We’ll cover problem (2) first, then return to (1) later - it’s more complicated.

Anyway, what’s the deal with p-values comparing the randomized treatment arms?
Let’s explain using the two-sample t-test, which is meant to determine whether the observed data are consistent with an assumption (null hypothesis) that the “population mean” from which one sample was drawn is equal to the “population mean” from which the other was drawn.
Ex: if one wishes to test whether people with brown eyes tend to be taller than people with blue eyes, one might recruit a random sample of people with brown eyes and people with blue eyes, measure the heights in each, and perform a two-sample t-test comparing the sample means.
In this setting, a p-value represents the probability that the observed difference in our sample data would occur under the null hypothesis (in this case, “the height of brown-eyed people is equal to the height of blue-eyed people” in the population of interest)
Suppose that the p-value was 0.01, meaning (using common language…) there was only a 1% chance of observing a difference this large in our sample if the 2 populations of interest (brown-eyed people vs blue-eyed people) actually share a common distribution of height.
Since it was unlikely to observe this difference under Ho, so we “reject” the null hypothesis that brown-eyed people and blue-eyed people share a common distribution of height, concluding that…
…our sample provides evidence that the population distribution of height is different for brown-eyed people than it is for blue-eyed people.
Now, let’s talk about a common Table-1-of-an-RCT scenario: comparison of the mean age for participants assigned to Drug A versus Drug B. Performing a two-sample t-test comparing mean age in patients assigned to Drug A versus mean age in patients assigned to Drug B is testing…
…the probability of observing this data under the null hypothesis that the “population mean” age of participants in the group assigned to Drug A is equal to the “population mean” age of participants in the group assigned to Drug B.
However, the participants assigned to Drug A came from the *same population* as participants assigned to Drug B (patients meeting inclusion criteria that enrolled in the trial).
The patients assigned to each treatment arm share the *same* population distribution for all baseline variables because they are selected from a common population, then randomly split into 2 (or however many) groups.
As the late Doug Altman said: “performing a significance test to compare baseline variables [my addition: in an RCT] is to assess the probability of something having occurred by chance when we know that it did occur by chance.”
Furthermore, as Frank Harrell has pointed out, substituting randomly generated numbers for all baseline variables would show a handful “baseline differences” between the populations, too. No one would suggest that we need to “adjust” for the randomly generated numbers.
This was covered repeatedly in the statistical literature in the 1980’s and 1990’s:
Altman DG. Comparability of randomized groups. J Royal Stat Soc 1985; 34: 125-136.

Altman DG, Dore CJ. Randomisation and baseline comparison in clinical trials. Lancet 1990; 335: 149-153.
Senn SJ. Baseline comparisons in randomized clinical trials. Stat Med 1991; 10: 1157-1160.

Senn SJ. Testing for baseline balance in clinical trials. Stat Med 1994; 13: 1715-1726.
Begg CB. Significance tests of covariate imbalance in clinical trials. Controlled Clin Trials 1990; 11: 223-225.
CONSORT guidelines recommend against it: “significance testing of baseline differences in randomized controlled trials (RCTs) should not be performed, because it is superfluous and can mislead investigators and their readers”
Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, et al. CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010; 63: e1–37.
And yet, despite the marvelous papers by Altman, Begg, Senn, and others on the subject: many researchers (clinicians and statisticians both) still believe that p-values comparing the randomized treatment arms are the appropriate first step in analyzing / interpreting an RCT.
I’ve seen reviewers and editors ask for it. A team of researchers (de Boer et al, cited below) described their experience thusly:
“…in our submitted papers we followed the CONSORT statement in not testing for baseline differences. However, after submission of the papers we were again faced with comments that tests of baseline differences should be added, but now from reviewers or even editors.”
“To our surprise and dismay, these reviewers insisted on this point even after we had provided a logical explanation why we preferred not to present these p-values. Eventually, we decided to add the tests and as a result they are included in all four of our publications.”
Reference: de Boer MR, Waterlander WE, Kuijper LDJ, Steenhuis IHM, Twisk JWR. Testing for baseline differences in randomized controlled trials: an unhealthy research behavior that is hard to eradicate. Int J Behav Nutr Phys Act 2015; 12: 4.
So, yeah, this is definitely still a problem in the medical literature.
The #medtwitter crowd that frequently engages with Harrell, Senn, et al probably has started to come around on this. My little survey is likely biased since my sphere of Twitter influence/conversation includes mostly MD’s that engage with statisticians.
Strong suspicion (unproven, of course) that this survey carried out in many academic-medicine departments would have >80% of people answering that the p-values in Table 1 are necessary because you need to compare the randomized treatment arms and/or check for balance.
But please, share this information (the Twitter thread for the lay version; the Senn, Altman, de Boer papers for more technical and professional explanation) with your friends and colleagues

Together, maybe, one day, people will stop asking about p-values in Table 1 of RCT’s
“Why didn’t you write a paper about this?” – I’m working on it. But this isn’t really new content. Statisticians published a bunch of these papers in the 1980’s and 1990’s. The trick is getting people to read them, and then to *change their thinking* as a result.
With respect to Senn, Altman, and others (all fine writers) it seems that papers published in statistics journals tend to be ignored or trivialized by the clinical research world. Understandably, to a degree, there are a hundred medical journals they’re trying to keep up with.
It’s time to try communicating in multiple fronts, both traditional and nontraditional.
Oh, as for this: “(1) the mistaken belief that perfect “baseline balance” is necessary for a treatment comparison to be valid, and that any deviations from such balance (presumably, as shown by p<0.05’s in the baseline table) undermine the trial’s primary comparison.”
We’ll come back to that another time. A few brief thoughts:
Paraphrasing Harrell: Statistical inference is based on probability distributions. It is sufficient to know that the tendency was for baseline covariates to be balanced, because it is the tendency on which assumptions of the statistical tests are based.
Paraphrasing Senn: The probability calculation applied to a clinical trial automatically makes an allowance for the fact that groups will almost certainly be unbalanced, and if one knew that they were balanced, then the calculation that is usually performed would not be correct.
And finally, focus on “baseline balance” has impeded discussion of a more productive step in analysis of RCT that would alleviate some of these concerns: pre-specifying baseline covariates that should be adjusted for in the final treatment comparison.
Many, many, many papers in the clinical epi/statistics literature have discussed this:
Canner PL. Covariate adjustment of treatment effects in clinical trials. Controlled Clin Trials 1991; 12: 359-366.

Tukey JW. Tightening the Clinical Trial. Controlled Clin Trials 1993; 14: 266-285.
Neuhaus JM. Estimation Efficiency with Omitted Covariates in Generalized Linear Models. J Am Stat Assoc 1998; 93: 1124-1129.

Hauck WW, Anderson S, Marcus SM. Should We Adjust for Covariates in Nonlinear Regression Analyses of Randomized Trials? Controlled Clin Trials 1998
Steyerberg EW, Bossuyt PMM, Lee KL. Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics? Am Heart J 2000; 139(5): 745-751.
Hernandez AV, Steyerberg EW, Habbema JDF. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J Clin Epi 2004; 57(5): 454-460.
Hernandez AV, Eijkemans MJC, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: How much does prespecified covariate adjustment increase power? Ann Epi 2006; 16(1): 41-48.
Gray LJ, Bath P, Collier T. Should stroke trials adjust for functional outcome for baseline prognostic factors? Stroke 2009

Kent DM, Trikalinos TA, Hill MD. Are unadjusted analyses of clinical trials inappropriately biased toward the null? Stroke 2009
Lingsma H, Roozenbeek B, Steyerberg E. Covariate adjustment increases statistical power in randomized controlled trials. J Clin Epi 2010; 63(12): 1391.
Groenwold RHH, Moons KGN, Peelen LM, Knol MJ, Hoes AW. Reporting of treatment effects from randomized trials: A plea for multivariable risk ratios. Contemp Clin Trials 2011; 32(3): 399-402.
Ciolino JD, Martin RH, Zhao W, Jauch EC, Hill MD, Palesch YY. Covariate imbalance and adjustment for logistic regression analysis of clinical trial data. J Biopharm Stat 2013; 23(6): 1383-1402.
Thompson DD. Lingsma HF. Whiteley WN, Murray GD, Steyerberg EW. Covariate adjustment had similar benefits in small and large randomized controlled trials. J Clin Epi 2015; 68(9): 1068-1075.
Lee PH. Covariate adjustments in randomized controlled trials increased study power and reduced biasedness of effect size estimation. J Clin Epi 2016; 76(1): 137-146.
Jiang H, Kulkarni PM, Mallinckrodt CH, Shurzinkse L, Molenbergs G, Lipkovich I. Covariate Adjustment for Logistic Regression Analysis of Binary Clinical Trial Data. Stat Biopharm Res 2017; 9(1): 126-134.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrew Althouse

Andrew Althouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ADAlthousePhD

Jan 26
Noninferiority trials: a musing thread that I may regret.

NI trials are fickle & unsatisfying. Sometimes there's a legitimately good reason to do them (discussed below); the stats are maddening (also discussed below).
Suppose we have a particular group of patients that need to undergo a certain procedure. The procedure has a theoretical, biologically plausible risk of causing a particular complication; we generally give patients some prophylactic therapy against that complication.
Since no intervention is totally benign, we know that even this prophylactic therapy has its own risks, so we wonder if perhaps we could give patients a lower dose of it than typically used without losing the protective effect against the complication it's intended to prevent.
Read 16 tweets
Nov 18, 2021
@sim0ngates @chrisdc77 @hafetzj @eturnermd1 Yeah, I thought about (and should have said something about) the distinction between industry funded vs academic sponsored trials. The exact process is a bit different but the challenges would be similar-ish. Agree that industry/regulatory bodies would have to be on board.
@sim0ngates @chrisdc77 @hafetzj @eturnermd1 Of course the easiest way to make this happen would be for the major regulators to make it happen. But as Chris (I think?) said a little while ago, this was evidently part of the original discussion for clinical trials dot gov but they didn’t go all the way to RRs.
@sim0ngates @chrisdc77 @hafetzj @eturnermd1 I think some academic trialists might be persuaded or at least attracted by the idea that they could have a much-expedited peer review process on the back end. If can be frustrating to do a trial, write up your results & then spend another year submitting to 3 different journals
Read 5 tweets
Nov 15, 2021
Thread on relationships between researchers and statistical consultants. Prompted by a few recent tweets, but not only those as this is a recurring and always-relevant conversation.
On the "researcher seeking stats help" side, there is an often-justified feeling that statistical consultants are difficult to work with (even those in good faith) and sometimes downright unhelpful or unpleasant.
So - let's address those right up front as part of this thread about making these relationships productive & relatively happy.
Read 31 tweets
Sep 1, 2021
Riddle me this, stats/medicine people.

I know about Justify Your Alpha.

Has anyone in *medicine* (or otherwise, but particularly interested in US academic medicine) actually proposed a study where they said they'd use an alpha threshold above 0.05? How was it received? (cont)
(Also, please do me a favor, spare me the arguments about NHST being a flawed paradigm on this particular thread)
Clearly not all studies have the same tradeoffs of a false-positive vs a false-negative finding, and in some cases a higher alpha threshold seems like it should be warranted...
Read 7 tweets
Aug 22, 2021
@Jabaluck @_MiguelHernan @aecoppock I think (perhaps unsurprisingly) that this shows “different people from different fields see things differently because they work in different contexts” - the scenario you painted here is not really possible with how most *medical* RCTs enroll patients & collect baseline data
@Jabaluck @_MiguelHernan @aecoppock The workflow for most medical RCTs (excepting a few trial designs…which I’ll try to address at the end if I have time) is basically this:
@Jabaluck @_MiguelHernan @aecoppock 1. Clinics/practices/hospitals know that they are enrolling patients in such-and-such trial with such-and-such criteria.
Read 20 tweets
Jul 16, 2021
Amusing Friday thoughts: I've been reading Stuart Pocock's 1983 book Clinical Trials: A Practical Approach (do not concern yourself with the reason).
There is a passage on "Statistical Computing" in Chapter 11 of the book which one might have expected would age poorly, but is in fact remarkable for how well several of the statements have held up.
"I would like to refer briefly to the frequent misuse of statistical packages. Since they make each analysis task so easy to perform, there is a real danger that the user requests a whole range of analyses without any clear conception of what he is looking for."
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(