Kert Viele Profile picture
Clinical Trial Designer. Director of Modeling and Simulation, Berry Consultants. Statistics PhD Carnegie Mellon.
Sep 5, 2022 14 tweets 3 min read
(1/14) I will go back to papers soon….but a brief “where are we and where are we going?” thread on non-concurrent controls, involving the science and regulatory climate. (2/14) As for science, recent papers have done a good job illustrating
1. how/why non-concurrent controls can produce benefits
2. quantifying those potential benefits
3. identifying the key assumption needed to avoid risks
4. quantifying the cost of violating that assumption
Aug 23, 2022 13 tweets 4 min read
(1/13) At least three papers came out recently on non-concurrent controls in platform trials This is a rapidly evolving area both for academics and for regulators. I’m going to try to summarize them, as they all have overlapping content. 4 threads…unless another paper appears! (2/13) First…what are non-concurrent controls and why are they complicated? In a platform, arms enter and leave the platform, while the control is always enrolling. When a new arm enters, there is a prior history of control participants. Those are non-concurrent controls. Image
Aug 30, 2021 13 tweets 3 min read
(1/n) This is one of my absolute favorite bias examples in adaptive trials. What biases occur and how much should we care about them? This is a simple example, but I think interesting with unbiasedness feeling very unsatisfactory (with some Bayes mixed in) (2/n) Ultra simple trial. Coin with unknown p=Prob(heads).
The trial is “adaptive”.
Flip the coin once…if heads stop, otherwise flip it once more.
Your data will either be H, or TH, or TT.
Goal is to estimate p.
Aug 27, 2021 17 tweets 3 min read
(1/n) I think there is a place for error rates, multiplicities, and accounting for early stopping within a Bayesian framework, but it depends on where in the process you are looking. How do we value an experiment prior to conducting it, as opposed to valuing the data afterward. (2/n) I’m just going to focus on the “how do you value an experiment prior to conducting it”. This is a practical problem for a funding agency evaluating grants, or for a regulatory agency deciding in advance if a study is likely to generate sufficient information for approval.
Aug 10, 2021 10 tweets 2 min read
(1/10) It’s worth considering what events a doubling of sample size is likely to detect. Certainly this increase is worthwhile for some events, but will have little impact on our knowledge of others. Here’s what going from N=1250/arm to N=2500/arm could tell us…. (2/10) For safety events, required sample sizes are driven by the baseline rate of the event and the magnitude of change you are trying to detect. Smaller baseline rates require bigger sample sizes, as do smaller effects.
May 19, 2021 18 tweets 6 min read
(1/n) @US_FDA COVID Master protocol guidance out…..a review and comments…guidance is specific to COVID but clearly applicable more generally. (2/n) First and foremost…FDA is wonderfully encouraging of these designs! Lots of comments on accelerating drug development. Also notes the extra complexity and startup time. Emphasis on “early and often” discussion with FDA. ImageImage
Apr 22, 2021 24 tweets 6 min read
(1/n) Saw this letter in @NEJM…also covered by journalists. Concurrent vs non-concurrent controls is a complex issue in platform trials. This is a tweetorial on these issues, and a comment on the letter.
nejm.org/doi/full/10.10… (2/n) Most importantly, the letter addresses @remap_cap in tocilizumab and sarilumab. Both the concurrent and non-concurrent analyses are reported. They agree. This is a non-issue for these particular therapies.
medpagetoday.com/infectiousdise…
Dec 10, 2020 16 tweets 5 min read
(1/n) We often see related datasets (multiple vaccine trials, multiple trials exploring the amyloid hypothesis in Alzheimer’s, basket trials in oncology) How do we think about multiple groups of data at once? A tweetorial on motivating shrinkage estimates…. Image (2/n) A lot of good examples are basket trials in oncology.

Example ROAR by @VivekSubbiah
thelancet.com/journals/lanon…

We are exploring a targeted therapy in multiple tumor types. Often these trials are single arm trying to show superiority to a pre-specified response rate.
Oct 20, 2020 8 tweets 2 min read
(1/8) Was on a DSMB lately and had a discussion with the sponsor regarding what data they would show us. The sponsor was concerned about “spending alpha” by showing us certain data. So a thread on what spends alpha (efficacy interims? futility? any look at all? etc). (2/8) In a frequentist trial, you have to control type I error. If the null hypothesis is true, you must have less than a alpha=2.5% (usually) chance of falsely declaring efficacy. The only things which “spend alpha” are actions which might declare efficacy.
Oct 19, 2020 13 tweets 3 min read
(1/n) As requested….The Bayesian perspective on this would simply be “given the data, what is our estimated treatment effect?”. This might be a point estimate, a credible interval, the probability of a beneficial treatment effect, or the entire posterior distribution. (2/n) Bias feels backward to a Bayesian confronted with data. Bias is phrased as “suppose we knew the true rate was X, would would our average data look like”. We are in the reverse situation. We know the exact data, we want to know the true rate.
Oct 10, 2020 18 tweets 5 min read
(1/n) Tweetorial…When people talk about a trial “stopping early”, that implies there is an “on time”. Standard trials really “look late”, and interim analyses bridge the gap. Examples in terms of the current vaccine trials. Based on
berryconsultants.com/berry-consulta… (2/n) Data is noisy…a good therapy can look bad early in a trial, and a null(dud) therapy can look good. A lot of statistical theory is dedicated to quantifying this range. When can we be “sure” the data is good enough the drug isn’t a dud? When can we be “sure” it works?
Sep 19, 2020 12 tweets 4 min read
(1/n) The Pfizer SARS-Cov-2 vaccine trial has 4 interim analyses. Each interim has a set number of events and certain splits result in declaring efficacy or futility. If they were to announce “the trial is continuing after interim X”, what might we learn?
pfe-pfizercom-d8-prod.s3.amazonaws.com/2020-09/C45910… (2/n) First off, I wouldn’t make these announcements. The trial is continuing per protocol. We need the trial to finish per protocol to conclude anything. I think people misinterpret these announcements all the time (hence this tweetorial….). Other opinions differ….
Sep 11, 2020 14 tweets 3 min read
(1/n) I’ve been following discussions of #Bayesian sequential analyses, type I error, alpha spending etc. At the risk of offending everyone (please be kind!), I see reasons Bayesian sponsors and regulators can still find value in type I error rates and so forth. (2/n) I’m focused on the design phase. After the trial, the data is the data. Lots of good stuff has been written on the invariance of Bayesian rules to stopping decisions. But in prospectively evaluating a trial design, even for a Bayesian there is a cost to interims, etc.
Aug 1, 2020 12 tweets 3 min read
(1/12) I receive emails warning about hackers targeting COVID data. Huge issues..it’s a crime and a danger to trial integrity. But also begs another discussion…if your loved one could be randomized but you could just select the current best arm, how much is gained? (2/12) Your timing matters. The information you could use to select an arm accumulates over time.
The design matters. Some designs treat patients within the trial better.
May 2, 2020 28 tweets 9 min read
(1/n) I saw this recently arguing against response adaptive randomization (RAR) in platform trials. There are good counterarguments to the objections…here is a tweetorial on the RAR debate (refs at end) and why RAR is used in many modern platform trials.
ncbi.nlm.nih.gov/pubmed/32222766 (2/n) RAR is an adaptive experimental method that employs multiple interim analyses. At each interim allocation for poorly performing arms is reduced (perhaps to 0) and allocation for better performing arms is increased.
Apr 7, 2020 20 tweets 5 min read
(1/20) Tweetorial on this statement…

”…red flag…The investigators….already taken an interim look…did not stop the study early….dampens optimism for an overall positive outcome”

Is this a red flag? Maybe…maybe not…depends on how optimistic you were to start. (2/20) I don’t have the details of the interims. If someone has them, I’ll followup with this specific trial. Without them, I’m going to discuss the general principles. The principles are easier to see in a single arm trial. They generalize completely to two arms.
Feb 18, 2020 17 tweets 3 min read
1) Utilizing any external data in a clinical trial, including real world evidence, presents risks and benefits. These should be quantified and balanced by choosing an appropriate design (a tweetorial explaining this graph…) 2) Simple example…dichotomous outcome, you have a novel therapy and resources for 60 patients. You also have a database of untreated patients showing a 40% response rate. How can you incorporate the database into the trial?
Dec 8, 2019 7 tweets 2 min read
1) If you achieve an observed effect in a (standard) study equal to the effect you powered for, the one-sided p-value is 0.0006. This has two implications, the first on avoiding “significant but meaningless” results, and the second on resource allocation (tweetorial) 2) Standard clinical trials, testing means and proportions with alpha=0.025, protect against bad luck. If you have 90% power for an effect X, you reject the null for anything above 0.604 X. This protects you against missing an effective therapy that was unlucky in your trial.