Read on Twitter

12,399 views

@ADAlthousePhD

, 33 tweets, 6 min read Read on Twitter

https://twitter.com/jd_wilko/status/1148619737980702726

https://twitter.com/jd_wilko/status/1148619737980702726

Well, I didn’t know of one, but I was sufficiently interested by Jack’s question that I decided to try my hand at this, so here goes.

https://twitter.com/jd_wilko/status/1148619737980702726

Motivation: most papers that I see looking at survival use a time-to-event analysis, but occasionally I do see papers that treat death at a fixed time point as a binary outcome (e.g. “1 year mortality”) & use logistic regression rather than performing a time-to-event analysis

Jack’s recent tweet got me thinking: how often would using logistic regression and binary outcome actually return different results than proper time-to-event analysis?

Disclaimer, part 1: I’m an *applied* statistician, and very rarely write methods papers; similarly, I use simulations to screw around and think through problems, but have never written a methods paper using simulations.

Disclaimer, part 2, for the Bayesians out there: the initial question asked about power, so the results shown here are vanilla NHST, asking: in same dataset, how often would a time-to-event analysis (Cox) conclude there is a treatment effect where a logistic regression did not?

Disclaimer, part 3: for the time being, I ignored stuff like missing data, censoring before the end of follow-up but without experiencing the outcome, adjustment for baseline covariates, etc. Those would have varying degree of influence, but I wanted to keep this simpler for now

Here’s the procedure:

Simulate a dataset of survival times for “Treatment A” and “Treatment B” (information on distributions will follow)

Create a yes/no indicator for whether the patient survived to t=365 (“one year”); then, set all survival times >=365 to =365 (e.g. all patients still alive at one year are “censored” at that time);

Estimate effect of treatment assignment using logistic regression (yes/no alive at one year), store p-value for treatment effect; estimate effect of treatment assignment using Cox PH regression (time-to-event w/all patients alive at t=365 censored at t=365), store p-value

Repeat for 1,000 simulations; compare how many of the trials would reject Ho if analyzed using logistic regression versus Cox PH regression

I generated survival times for each group using a Weibull distribution, messing around with the shape and scale parameters to get survival times for different scenarios

(sidebar: there’s nothing quite like one of these problems to make you dust off that stuff you learned in first year of graduate school about the different distributions, their cumulative density functions, etc)

SCENARIO #1

Not really knowing where to begin, I started with a shape parameter of 1 and a scale parameter of 526 because that makes the median survival time about t=365 by the CDF (so about half the simulated patients will have the event in the first year). Cool, that’s one place to start

Okay, now I need a comparison group. Let’s make the scale parameter…800. Why? Well, it’s a nice round number, and translates to about 37% of the patients having the event in the first year by the CDF, which seems like a big-but realistic “treatment effect” for our simulation.

Here’s an example of one simulated dataset using those shape and scale parameters with n=200 per group (black line: shape=1, scale=526; red line, shape=1, scale=800).

Worth noting: in the simulated datasets, we “know” there is a “treatment effect” because we have generated the data to have this distribution: a sufficiently powered study should conclude there is a treatment effect, so the “right” answer is rejecting Ho.

At n=200 per group, this scenario should have about 80% power using time-to-event analysis (this could be a little variable depending on exactly how you do the power calculations), meaning 80% of the trials would reject Ho under these settings.

Anyway, in my first 1,000 simulated trials: 759 would reject Ho (alpha=0.05) using Cox regression OR logistic regression; 27 would have rejected Ho using Cox but not logistic; 9 would have rejected Ho using logistic, but not Cox; and 205 would not have rejected Ho in either

Interesting! So using the Cox regression would reject Ho 786 times out of 1000 (in the right ballpark of “80% power”) – but only 27 of them did the logistic regression *fail* to reject Ho where the Cox model did. I would have guessed this would be higher.

Well, that begs the question: under what scenarios is this different? In other words, when does tossing the time-to-event information really cost you, and when does it not matter all that much?

The first genius revelation: if the time-to-event data is generated in such a way that there’s a really big difference in survival at the end time point, of course the logistic model will still reject Ho most of the time.

This will be more costly (in terms of statistical power, anyway) when the survival proportions at the end are (relatively) closer, but the *shape* of the curves is different.

(something something proportional hazards, I know; we’ll be discuss that in the COMMENTS section…)

SCENARIO #2

Let’s change it up a little bit and see what happens! I wanted to look at a scenario with a bigger sample size, and a little smaller gap in the “1 year survival” - where the shape of the curves might play a bigger role.

For Treatment A, let’s go with a shape=0.5 and scale=800. The Weibull CDF tells me that this translates to an estimated 49.1% of patients having the event during the first year.

For Treatment B, we’ll change the parameters to shape=1 and scale=633. The Weibull CDF tells me that this translates to an estimated 43.8% of patients having the event during the first year.

Let’s also go up to n=500 per group. Here’s an example of one simulated dataset using those shape and scale parameters with n=500 per group (black line: shape=0.5, scale=800; red line, shape=1, scale=633).

Look at this one in comparison to the first Figure. The curves are a little smoother because of the increased N, and the shape of the black line is subtly different because of the shape parameter (initial drop off is a little steeper and then flattens out)...

...but at a glance, the “treatment effect” doesn’t look all *that* different in these two situations, does it?

So who wants to guess what happens when we do our “how often do we reject Ho with logistic vs Cox models” simulation?

Survey SAYS, in one thousand simulated trials:

382 trials would reject Ho (alpha=0.05) using Cox regression OR logistic regression; 445 (!!!) trials would have rejected Ho using Cox regression but not logistic; 0 trials would have rejected Ho using logistic, but not Cox; and 173 would not have rejected Ho in either analysis.

Zoinks, Scoob! Under these conditions, almost *half* of the simulated trials would conclude that there was a treatment effect with Cox regression, but would not conclude that there is a treatment effect using the binary “1 year survival” outcome.

Anyways, this was all just in fun. I'll try to formalize it a bit more later once I have been able to think through different variations more. Happy to share the R code if requested, at some point I'll get around to posting it.

Like this thread? Get email updates or save it to PDF!

Subscribe to Andrew Althouse

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Andrew Althouse

This content may be removed anytime!

Try unrolling a thread yourself!

More from @ADAlthousePhD see all

Related threads

Trending hashtags

Did Thread Reader help you today?