(THREAD) this recent publication offers a teachable moment on data analysis of parallel-group RCT’s that include pre- and post-treatment measures of a continuous outcome variable: ncbi.nlm.nih.gov/pubmed/30864188
The authors randomly assigned participants with type 2 diabetes to receive either curcumin (n=23) or placebo (n=21) for 10 weeks.
Participants gave a baseline and end-of-trial blood sample from which the research team measured triglycerides, total cholesterol, LDL, HDL, hs-CRP, and adiponectin at both time points.
The study team did some things properly in carrying out this trial. Unfortunately, they did not use a very good approach for their data analysis and reporting of results.
For the moment, let’s put aside multiplicity considerations and focus on just one outcome: triglycerides, since a) it’s listed at the top of Table 4 and b) the authors chose to highlight triglyceride (TG) results in their abstract and conclusions.
Abstract - triglyceride results: “The mean serum level of TG decreased in curcumin group compared with baseline (109 ± 36 vs. 124 ± 36; p < 0.05)…The results of the current study indicate that curcumin consumption may reduce diabetes complications through decreasing TG level...”
Wait a minute: the authors reported the pre-post TG results within the curcumin group (no mention of the values in the placebo group or comparison between groups) and concluded that curcumin was effective in reducing triglycerides.
Digging into the manuscript, in Table 4: authors performed paired t-tests of the pre-post values *within* each treatment group and got a “significant” (p=0.03) result for end-of-trial minus baseline in the curcumin group, a non-significant result (p=0.73) for in the placebo group
This is a classic mistake from many hopeful authors: test pre-post change *within* each treatment group, and if the change is “significant” in one group and “not significant” in the other group, conclude that the treatment was effective.
Interestingly enough, they also tested the difference in the “change in triglycerides” between the treatment groups, and found no significant difference between the groups. That’s still not the best way to test the treatment effect, but it’s closer to what they should have done
In the abstract and conclusions, they ignored that portion of the analysis and instead reported the within-group pre-post effect, probably because it’s the one that looked “more significant”
What’s wrong with all this?
As @f2harrell explains here (fharrell.com/post/errmed/#c…) the primary treatment comparison should have been tested using analysis of covariance with the end of study value as primary outcome, with covariate adjustment for the baseline value, to test the treatment effect of curcumin
“Many authors and pharmaceutical clinical trialists make the mistake of analyzing change from baseline instead of making the raw follow-up measurements the primary outcomes, covariate-adjusted for baseline.”
Another quote from Dr. Harrell which explains this very nicely:
“The purpose of a parallel-group randomized clinical trial is to compare the parallel groups, not to compare a patient with herself at baseline...
... The central question is for two patients with the same pre measurement value of x, one given treatment A and the other treatment B, will the patients tend to have different post-treatment values? This is exactly what analysis of covariance assesses.”
Take-home lessons if you are conducting a parallel-group RCT with a continuous outcome variable (blood pressure, lipids, anything of the sort):
1) do not perform significance tests for within-group change in outcome(s). It is inappropriate to perform paired t-tests for within-group change, then use p<0.05 for one group and p>0.05 for the other to say that treatment was more effective at changing the outcome.
2) DO: perform analysis of covariance with the final value as the outcome variable, treatment group as your primary effect of interest, and a covariate adjustment for the baseline value.
This actually answers the question that you (should be) trying to answer: for two patients with the same PRE value, will the patients tend to have different POST values if they receive treatment A versus treatment B?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Noninferiority trials: a musing thread that I may regret.
NI trials are fickle & unsatisfying. Sometimes there's a legitimately good reason to do them (discussed below); the stats are maddening (also discussed below).
Suppose we have a particular group of patients that need to undergo a certain procedure. The procedure has a theoretical, biologically plausible risk of causing a particular complication; we generally give patients some prophylactic therapy against that complication.
Since no intervention is totally benign, we know that even this prophylactic therapy has its own risks, so we wonder if perhaps we could give patients a lower dose of it than typically used without losing the protective effect against the complication it's intended to prevent.
@sim0ngates@chrisdc77@hafetzj@eturnermd1 Yeah, I thought about (and should have said something about) the distinction between industry funded vs academic sponsored trials. The exact process is a bit different but the challenges would be similar-ish. Agree that industry/regulatory bodies would have to be on board.
@sim0ngates@chrisdc77@hafetzj@eturnermd1 Of course the easiest way to make this happen would be for the major regulators to make it happen. But as Chris (I think?) said a little while ago, this was evidently part of the original discussion for clinical trials dot gov but they didn’t go all the way to RRs.
@sim0ngates@chrisdc77@hafetzj@eturnermd1 I think some academic trialists might be persuaded or at least attracted by the idea that they could have a much-expedited peer review process on the back end. If can be frustrating to do a trial, write up your results & then spend another year submitting to 3 different journals
Thread on relationships between researchers and statistical consultants. Prompted by a few recent tweets, but not only those as this is a recurring and always-relevant conversation.
On the "researcher seeking stats help" side, there is an often-justified feeling that statistical consultants are difficult to work with (even those in good faith) and sometimes downright unhelpful or unpleasant.
So - let's address those right up front as part of this thread about making these relationships productive & relatively happy.
Has anyone in *medicine* (or otherwise, but particularly interested in US academic medicine) actually proposed a study where they said they'd use an alpha threshold above 0.05? How was it received? (cont)
(Also, please do me a favor, spare me the arguments about NHST being a flawed paradigm on this particular thread)
Clearly not all studies have the same tradeoffs of a false-positive vs a false-negative finding, and in some cases a higher alpha threshold seems like it should be warranted...
@Jabaluck@_MiguelHernan@aecoppock I think (perhaps unsurprisingly) that this shows “different people from different fields see things differently because they work in different contexts” - the scenario you painted here is not really possible with how most *medical* RCTs enroll patients & collect baseline data
@Jabaluck@_MiguelHernan@aecoppock The workflow for most medical RCTs (excepting a few trial designs…which I’ll try to address at the end if I have time) is basically this:
@Jabaluck@_MiguelHernan@aecoppock 1. Clinics/practices/hospitals know that they are enrolling patients in such-and-such trial with such-and-such criteria.
Amusing Friday thoughts: I've been reading Stuart Pocock's 1983 book Clinical Trials: A Practical Approach (do not concern yourself with the reason).
There is a passage on "Statistical Computing" in Chapter 11 of the book which one might have expected would age poorly, but is in fact remarkable for how well several of the statements have held up.
"I would like to refer briefly to the frequent misuse of statistical packages. Since they make each analysis task so easy to perform, there is a real danger that the user requests a whole range of analyses without any clear conception of what he is looking for."