PROMISE Diabetes Substudy Tweetorial...

I was planning a much funnier tweetorial will polls and illustrations and the like but due to competing deadlines and tons of clinical work today, this will have to do.
First, as a general principle it is dangerous to do subgroup analyses of negative trials.
Why is this?

a "negative" trial implies that the effect size was too small to be demonstrated in the population size studied.
What a subgroup analysis does is decrease the population size further in hopes of finding an effect....

This is paradoxical.
...Unless the effect is markedly stronger in the subgroup.
But then we run into a problem of multiplicity. What that means is we test for an effect in many different subgroups, by chance we will find some that have P<0.05. Some could be much less than 0.05.
There are very clever Bayesian ways of dealing with that, but most trials deal with this by two methods.

1. Adjusting for multiple comparisons: demand a lower p-value when checking many subgroups
2. Pre-specifying which subgroups will be tested AND how they will be tested
If you don't do this, you can often find crazy subgroups that show significant effect heterogeneity (meaning one group has a much stronger effect than the other)
The most famous example of this I know is the ISIS-2 trial:

ncbi.nlm.nih.gov/pubmed/2899772
I'm not adding figures for lack of time, but they found that for Gemini/Libra ASA had a non-significant increase in mortality while for all other signs there was a strikingly favorable benefit (P<0.00001) (yep, that's five decimal places rather than the usual 3-4)
But wait, this analysis was prespecified. Yes indeed it was. It was prespecified but the authors already published an analysis on diabetes in PROMISE. This was a second *ad hoc* analysis without a detailed plan.

That leads to a concept called the garden of forking paths...
That concept was put forth in those terms by @StatModeling to describe how without a prespecified plan one can make many different analytic choices and come to an utterly unreliable conclusion.
@StatModeling There is more at stake here though than just multiplicity of testing and lack of pre-specification!
@StatModeling Whenever an RCT is analyzed for the primary outcome, in nearly all cases, intention to treat analysis is the preferred analytic method (there are some specific caveats and exemptions, but these are subtle).
@StatModeling Here is a great @ProfDFrancis tweetorial on why Intention to Treat (ITT) is almost always the right answer:

@StatModeling @ProfDFrancis Many people hate ITT, almost always because other types of analysis give them answers they like better. (i.e. answers with P<0.05)
@StatModeling @ProfDFrancis So what does ITT have to do with the PROMISE DM substudy?
@StatModeling @ProfDFrancis Well, in this substudy, (ooops forgot to link it above provided here), they exclude anyone with a non-diagnostic scan.

onlinejacc.org/content/73/8/8…
@StatModeling @ProfDFrancis They do this in both arms, which seems rather sensible doesn't it?

How can a scan possibly help someone if it is non-diagnostic?
@StatModeling @ProfDFrancis It can't possibly help. That said, the reasons for non-diagnostic scans are very important.
@StatModeling @ProfDFrancis Some common reasons for non-diagnostic CCTA are heavy calcifications. Not normal but not clear how abnormal. Certainly often unclear whether mildish or severeishly abnormal.
@StatModeling @ProfDFrancis This is important because diabetics are more likely to have calcifications than non-diabetics.
@StatModeling @ProfDFrancis And calcifications are a sign of the presence of plaque, which is higher risk than no plaque at all.
@StatModeling @ProfDFrancis This is critical now: so the probability of a non-diagnostic CCTA very well might be related in some non-trivial way to the extent of CAD.
* scans with no CAD at all have no CAC and therefore are not non-diagnostic for this reason (could be for other reasons)

* scans with lots of CAC might be non-diagnostic but are still high risk
Let's take the case of a stress EKG. having repolarization abnormalities can make a treadmill test non-diagnostic. But this type of abnormality carries a very small risk for CAD complications.
Bad emphysema may cause a stress echo to be non-diagnostic and that does have some connection to CAD complications (through smoking - if still or recently active), but it is not a direct connection to CAD risk like CAC as MESA and other studies have taught us.
So it is very possible, even likely that excluded CCTA scans (non-diagnostic ones) are higher risk than excluded stress tests.

This creates a serious bias.
And ironically, the study itself presents data in the very same Online Supplemental Table 11 that many have told me to look up.
(I was going to make pretty figures and copy tables, but no time)
In that table, lets look first at the CCTA arm:

the rate of MI in diabetics who undergo CCTA is 0.2% (2/936). This is the row that drives the CCTA vs. stress result to be significant because in the stress arm there are 1.3% MI rate in diabetics (13/972).
YAY! CCTA FTW. Rip out all the stress equipment. Let's move to CCTA first.
There is a problem. That problem is that we just said we might have excluded the highest risk diabetics who had lots of CAC because of non-diagnostic CCTAs.
There is evidence in this table that this indeed happened.
In the CCTA arm the rates of MI:

Diabetics: 0.2% (2/936)
Non-diabetics: 0.7% (24/3564)
So in the CCTA arm the diabetics had fewer MIs than the non-diabetics!

Turns out that CCTA cures the excess MI risk of DM!!!!
Clearly ridiculous!
Instead, the right conclusion is that we must have whittled away the diabetics with highest MI risk somehow... we did that by not using ITT, I suspect.
Indeed, the rate of MI among diabetics is likely at least 1% (slightly higher than the 0.7% in non-diabetics).
If you assume that the true MI rate in diabetics if ITT was 1% or so (makes sense). Then the chi-squared test for CV death + MI comparing CCTA vs. stress testing has P=0.26.

Doing this with a proper Cox model would be better (and just might lead to P<0.05 but most likely not).
So what are our issues:

1. No ITT
2. Not prespecified
3. Small numbers of events
4. Effect size is implausibly large (I didn't cover this but was discussed post SCOT-HEART).
5. Hints that functional test pts weren't tx w/ statins as aggressively, even if abnormal. (another day)
Because of time I could not tag people enough on this. I'm counting on #Twitterati to retweet the FIRST tweet in this thread to get people to read it. Please help.

Was this helpful?
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Venk Murthy 🥜🥜 Gallery Forever
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!