1) Why do we run group-sequential trials in drug development?

2) How does the effect we power at relate to the effect(s) needed to stop?

3) Is stopping such a trial early (like the one for the Pfizer vaccine) "cheating"?
4) If stopping early the effect estimate may be biased. Is this an issue?

5) What happens operationally if a trial stops early?
A thread from a pharma statistician who has developed, run, analyzed, and taught courses about trials with such designs.

@statsepi @lakens @MaartenvSmeden @stevesphd @ADAlthousePhD @DominicMagirr @thomas_jaki
Computations using R package cran.r-project.org/package=rpact (rpact.org). Specifically note vignettes: rpact.org/vignettes written by rpact folks and Marcel Wolbers from Roche.

A markdown file to reproduce numbers in this thread is here:

rufi77.github.io/twitter/202011…
1) Why?

Assume time-to-event endpoint, alpha = 0.05, power = 80%, hazard ratio to detect 0.75.

Number of events needed for single-stage trial: 380. In single-stage trial you wait for this number of events IN ANY CASE, i.e. even if your initial guess of HR = 0.75 was off.
Assume you add a futility interim (stop trial if HR <= 1) after 30% of events and an efficacy interim (O'Brien-Fleming alpha-spending) after 66% of events. This increases maximal number of events needed from 380 to 408.

Interims are performed after 123 and 270 events.
Now if we run 100 such trials, some of them will actually stop at 1st or 2nd interim. Probabilities for that happening are, under H0 and H1:

futility: 0.50 / 0.06
efficacy: 0.006 / 0.43

So e.g. if the drug is useless, half of all trials will stop at the futility interim.
Stopping at interim of course means we need to collect much less events. The *expected* number of events are thus:

Under H0: 0.50 ⋅ 123 + 0.006 ⋅ 270 + 0.49 ⋅ 408 = 264.
Under H1: 0.06 ⋅ 123 + 0.43 ⋅ 270 + 0.51 ⋅ 408 = 332.
So in both cases the expected number of events is *much less* than the 380 we need to collect in any case in a single-stage design. That is the main advantage of such designs.

2) How does the effect we power at, 0.75, relate to the effect size needed to stop for efficacy?
At efficacy interim, to stop early the p-value must be ≤0.012 and for the trial to be significant at the final it must be ≤0.046. These sig levels correspond to hazard ratios of 0.735 and 0.821, respectively.

Sometimes latter are called minimal detectable differences, MDD.
Often, people believe in order to stop a trial early effect seen at interim must be *much larger* than what we assumed for powering. Comparing 0.75 to 0.735 it is clear that this is not the case. That MDD@interim and effect we power at are ~same is typical for OBF-type boundary..
... and interim after about 2/3 of info.

Another common belief is that in order to be significant at interim we need to observe hazard ratio ≤0.75. Again, not true: MDD at final analysis actually is 0.821, i.e. in order to get p-value of 0.046 or lower this is...
...the hazard ratio we need to beat.

3) "Cheating"? Methodology for group-sequential designs is developed such that familywise-error rate of *all looks at the data* is kept. This is why at the final analysis, the p-value needs to be ≤0.046, not ≤0.05.
This is the price to pay for the interim look.

But why 0.012 + 0.046 > 0.05? Isn't that cheating? No, because through exploiting the correlation between test statistics at interim and final you can "gain" a bit of alpha. Again, no cheating, FWER always protected.
4) If stopping early the effect estimate may be biased. Is this an issue?

Lots has been written about inference adjusted for the fact that trial stopped early. I'd just like to give a median unbiased estimate of the hazard ratio in our example. Assume at the futility we...
...observe HR = 0.69 and at efficacy 0.66 with *conventional* 95% CI from 0.51 to 0.85. Since 0.66≤0.735 trial stopped for efficacy.

Median unbiased estimate accounting for early stopping amounts to 0.68 with adj CI from 0.53 to 0.86. So conventional and adj analysis are close.
This is in line with literature, see



5) What happens operationally if trial stops early? Statistically, we "stopped" trial at efficacy interim and rejected H0: hazard ratio = 1 under full type I error control. We would thus proceed with filing the drug.
But of course, operationally the trial would continue: more follow-up data would be collected on 1ry and 2ry endpoints (e.g. OS), safety, biomarker, etc. data collection would also continue, typically for years.

Also, often one would still do an analysis at 408 events, the...
...initially planned final analysis, to make sure results persist over time.

Note that stopping at efficacy interim typically leads to unblinding, so analysis and interpretation of follow-up needs caution and expertise.
So group-sequential designs reduce expected number of events needed and provide valid inference in reasonable cases.

This all within the framework of hypothesis testing, as required by Health Authority guidelines.

I hope this thread is useful. Comments welcome!

The end.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Kaspar Rufibach

Kaspar Rufibach Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @numbersman77

9 Nov
1/n I was asked to give an industry statistician's view on A) below. Disclaimer: I do not know about the exact regulations (which might also be region-specific). What I offer is a 1st hand experience of what happens around a trial stopping early. A thread.
3/n I was the trial statistician for this trial: nejm.org/doi/full/10.10…

Planned efficacy interim after ~245 PFS events.
1) Data with sponsor, except for tmt assignment (=rando codes).
2) Rando codes with IxRS vendor.
3) Indep. Stat. Reporting Group (ISRG) coordinates.
Read 11 tweets
5 Oct
Registration still open for EFSPI regulatory statistics webinars. Currently we have 627 and 534 registrations for the two webinars. Webex can handle 1000 people dialing in, so go ahead and register! 😉

Program and registration: efspi.org/Documents/Even…

On Day 2 we now added...
...two talks discussing the impact of COVID-19 on clinical trials and their estimands:

Yongming Gu (Eli Lilly): Using a mix of strategies in handling intercurrent events and missing values for studies impacted by the COVID-19 pandemic

...and...
Guenther Mueller-Velten, Yi Wang, Melanie Wright (Novartis): Impact of COVID-19 and risk mitigation in a global cardiovascular outcomes trial

@EFSPItweet #estimands #regulatorystatistics #statstwitter #biostatistics
Read 4 tweets
2 Oct
1/n Don Rubin: Design trumps analysis.

projecteuclid.org/euclid.aoas/12…

Great paper summarizing potential outcomes framework and discussing the continuum from RCTs to observational studies.

I wish I had read that back in 2008! Some quotes:

#causalinference #statstwitter #epitwitter
"For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard."
"All statistical studies for causal effects are seeking the same type of answer, and real world randomized experiments and comparative observational studies do not form a dichotomy, but rather are on a continuum, from well-suited for drawing causal inferences to poorly suited."
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!