this kind of crap is what happens when RCTs are automatically the "gold standard" in medical evidence.
</grumbly soapbox rant>
Even if we make the (extremely generous) assumption that failure to account for clustering is the only "major" error, I can't emphasize enough just how damning that error is.
It's both incredibly basic and incredibly important to deal with correctly. How did this happen?
Things are developing in a bad direction, so time to talk a little more about it.
Let's start with the most generous version of this, and say they just made a couple of honest mistakes. It happens! Stats and study design is hard!
Abstract: "Participants (n=551) were randomly assigned to calcifediol treatment"
That is an unambiguous statement that individuals were randomized to arms.
But that's false; taking things at face value, the 8 WARDS were randomized, NOT individuals.
That's super important, since when you are assigning treatment at a group level, the grouping is the important bit, and needs to be dealt with from day 1.
Think along the lines of this being more like n=8 than n=930 (not quite right, but you get the idea).
That's a HUGE deal, as it's effectively impossible to get usable results from an 8 cluster cRCT due to complete loss of statistical power.
In effect, we can't really tell if the results had to do with other stuff inherent with the different wards, or the vit-d "treatment"
But then there's another weird thing: If you have only 8 clusters, you would want to split them in half (4v4) not what they did: 5v3.
The reason is, again, statistical power. It's typically (not always) MUCH more efficient to split your groups evenly.
Sounds like a small thing, but it's....weird. Even if you knew nothing about clustered RCTs, you would probably know that you want an even split. So why the 5v3?
If that was all that was wrong, dayenu. This isn't subtlety, this is REALLY basic stuff for trial design/stats.
From here, things get .... weirder.
If this was a properly run trial (assuming the design/stats were legit, which they aren't), according to the paper the trial ended in May, 2020.
What was happening from then to now? It often takes a long time to get and clean the data, sure.
But given 1k person RCT (which generally requires a HUGE amount of planning and infrastructure) in a pandemic, you'd want to accelerate that to light speed, and get those results out ASAP. Especially if they showed these miraculous results (narrator: they don't)!
Then there is some weirdness about the study population: i.e. the embedding in the cohort.
That's not a problem by itself; it can be a huge time and resource saver. I've developed 2 RCTs to date, both of which were embedded in cohort studies for this very reason.
But the way it's described is ... weird.
There is some weird language here around "hospitalized randomly" implying that they assigned patients to wards randomly. Maybe a language or oversimplification issue, so maybe don't read too much into that.
If everything was done properly as described, we have a new issue: in the consent process, it seems that patients were given options. If they're given options, they might choose (or be encouraged) to go to different wards for various reasons.
That breaks a LOT of things.
If which ward was which changes what patient assignment to wards (e.g. patient 1 might want to be sent to the vit-d ward, or doctor might send them there), the randomization with respect to patient assignment is completely broken due to selection issues.
That's...not great.
Then there's ethics and protocol. The manuscript states that it received ethical approval for the study. Great! In theory, that means there is a protocol for this (this are usually not public) and a trial registration.
We should be able to verify that this was planned this way.
I am personally not familiar with the typical required, and standard processes for ethics, registration, and protocols in Spain.
If it was approved with a protocol that roughly matches what the manuscript says was done, then this is merely* study design and reporting negligence
* "merely" just means nothing more going on; there would still remain a jaw-dropping series of design and reporting errors.
Also worth noting that there are HUGE differences in the baseline levels of vit-d in the trial arms.
Don't fall into the trap of believing that randomization means that the arms are "balanced." That's not true. Differences are both expected and totally ok when things are done right.
But that's a HUGE difference, suggesting that there are fundamental differences between arms.
At best, this is some combination of ward-specific protocols, procedures that happened to line up with the arms (small n's do that), etc + the patient selection issue.
Again that would be enough to be super sus.
Altogether though, that is a LOT of issues that kinda happen to fall into place for these results.
So, at best, this is study design and reporting negligence on a lot of dimensions plus a little push from random chance.
Best bet is assuming this is the case.
HOWEVER, given the weirdness about these errors (and a clear willingness to play fast and loose with the word "randomized") we should do some due diligence here and verify that's the case, which is what is happening now.
This should all be pretty cut and dry if the ethics approval and protocol turns up and describes this RCT. If it doesn't...
In the meantime, we are playing catchup, since this study is blowing up all over the place with unscrupulous sharers and media reports.
This has the makings of yet another HCQ-type debacle (albeit probably not as big) with a hint of DANMASK.
Let's do our best to not let that happen.
If you want some more "live" look at this, @sTeamTraen has been at work here (and has a sterling reputation for discovering all kinds of research mistakes and misconduct, as well as well-known research trouble-maker @GidMK.
I'll probably update when I'm more sure of things.
Well, we have an an answer from the authors...of sorts. Copied here, because this sure is something.
What on earth does this mean, and how can you possibly square it with what's in the abstract and manuscript itself?
"We never say in the article that it is a randomized control trial (RCT) but we consider an open randomized trial, and an observational study."
The study describes randomization (implied at the individual level, but actually at the ward level). It describes a control (not receiving the "treatment") and it directly describes it as a trial.
There is ZERO question that, if we take them at their word, this is an RCT.
"Formal ethical approval was obtained shortly after the study started although verbal approval from the ethics committee was given at the time it was started while we completed all the bureaucratic process."
Oh. Oh no. That is very, very not ok.
In case there is any doubt at all, this is a direct quote from the manuscript, page 5:
"The effect of calcifediol administration was studied in a prospective open randomized controlled trial."
This was clearly suspicious from the start, but I am quite honestly pretty shocked, and did not expect this result.
There are still some open questions, but at minimum this is a major violation of public trust and ethics, not to mention scientific and statistical rigor.
I hope the authors do the right thing and pull it. There is no version of this situation which can save it.
Best we can do is be honest with our errors and move forward.
Also folks; please don't take it upon yourselves to try to "fix this" by leaving inappropriate comments or feedback.
The authors already have all the information they need, from well-qualified folks who do this kind of thing.
Let this run its course, no need for more attention.
An update (not doing play by plays): @sTeamTraen has been doing some excellent work following up and figuring out the ethics approval situation for this study, and things are looking .... not great.
The number referenced in the pre-print was for the local registration number (NOT ethical approval), who were not informed about the study until 60ish days into the study.
The PI specified that the study was approved by an external ethics board (totally normal).
@sTeamTraen is checking in with the referenced external ethics board, but noting that there doesn't appear to be any study registered that seems to match this one (ward-randomized vitamin-D trial etc.).
Could be an administrative issue or a miscommunication (it happens!).
Three possibilities here:
1) IRB approval exists, but was misreported/admin errors.
2) There was never randomization in the first place (i.e. no trial).
3) There was neither approval nor consent for this trial, contrary to authors' claims.
I sincerely, truly hope for #1.
.....and it's gone! The Lancet SSRN removed it from their server.
This is a pretty unusual move for a pre-print server to do, usually only happens in high-profile and extreme situations.
Hopefully this is the end of the story. Lots of mysteries remain like what actually happened in this study, ethics, etc.
In an ideal world, the authors and all the people unscrupulously promoting this study would work hard to undo the damage done and prevent the next round.
Not gonna happen though.
We desperately need to invest in our research infrastructure and community to do better designed and more ethical research, and prevent this kind of thing from happening in the first place.
To rewind a bit: this started as a story about "just" study/stats design (clustering standard errors), and should have been the end.
"Just" a stats issue was game over before it started.
But it's "just" stats and study design, so that's not enough.
What unfolded was frankly bonkers. It was "sus" but I would never have expected just how bad it was (and still more we don't know).
But it makes me deeply uncomfortable that the original fatal, basic, and unrecoverable flaw isn't enough to have prevented all of this.
Huge amounts of credit to data thug extraordinaire, @sTeamTraen, for pursuing this and all the folks in the back channels discussing and looking into things.
That kind of service to the public and science goes almost entirely unrewarded, and usually at cost to the ones doing it.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Folks often say that DAGs make our causal inference assumptions explicit. But that's only kinda true
The biggest assumptions in a DAG aren't actually IN the DAG; they're in what we assume ISN'T in the DAG. It's all the stuff that's hidden in the white space.
Time to make it official: short of some unbelievably unlikely circumstances, my academic career is over.
I have officially quit/failed/torpedoed/given up hope on/been failed by the academic system and a career within it.
To be honest, I am angry about it, and have been for years. Enough so that I took a moonshot a few years ago to do something different that might change things or fail trying, publicly.
I could afford to fail since I have unusually awesome outside options.
And here we are.
Who knows what combination of things did me in; incredibly unlucky timing, not fitting in boxes, less "productivity," lack of talent, etc.
In the end, I was rejected from 100% of my TT job and major grant applications.
Always had support from people, but not institutions.
Ever wondered what words are commonly used to link exposures and outcomes in health/med/epi studies? How strongly language implies causality? How strongly studies hint at causality in other ways?
READ ON!
Health/med/epi studies commonly avoid using "causal" language for non-RCTs to link exposures and outcomes, under the assumption that ""non-causal"" language is more ""careful.""
But this gets murky, particularly if we want to inform causal q's but use "non-causal" language.
To find answers, and we did a kinda bonkers thing:
GIANT MEGA INTERDISCIPLANARY COLLABORATION LANGUAGE REVIEW
As if that wasn't enough, we also tried to push the boundaries on open science, in hyper transparency and public engagement mode.
Granted, we only see the ones that get caught, so "better" frauds are harder to see.
But I think people don't appreciate just how hard it is to make simulated data that don't have an obvious tell, usually because somethig is "too clean" (e.g. the uniform distribution here).
At some point, it's just easier to actually collect the data for real.
BUT.
The ones that I think are going to be particularly hard to catch are the ones that are *mostly* real but fudged a little haphazardly.
Perpetual reminder: cases going up when there are NPIs (e.g. stay at home orders) in place generally does not tell us much about the impact of the NPIs.
Lots of folks out there making claims based on reading tea leaves from this kind of data and shallow analysis; be careful.
What we want to know is what would have happened if the NPIs were not there. That's EXTREMELY tricky.
How tricky? Well, we would usually expect case/hospitalizations/deaths to have an upward trajectory *even if when the NPIs are extremely effective at preventing those outcomes.*
The interplay of timing, infectious disease dynamics, social changes, data, etc. make it really really difficult to isolate what the NPIs are doing alongside the myriad of other stuff that is happening.
The resistance to teaching regression discontinuity as a standard method in epi continues to be baffling.
I can't think of a field for which RDD is a more obviously good fit than epi/medicine.
It's honestly a MUCH better fit for epi and medicine than econ, since healthcare and medicine are just absolutely crawling with arbitrary threshold-based decision metrics.
(psssssst to epi departments: if you want this capability natively for your students and postdocs - and you absolutely do - you should probably hire people with cross-disciplanary training to support it)