My Authors
Read all threads
As more stuff continues to break on the @NEJM and @TheLancet papers using the Surgisphere 'data' there's another possibility which has occurred to me that I want to play out.
I've been poring over these numbers for a few days and have not yet found a purely "statistical" smoking gun: a mean that cannot exist, a confidence interval that can't exist, etc.
Thus far most of the prevailing sentiment that this data isn't real seems to come from anecdotal beliefs: not very much evidence that the company exists, insider knowledge of how hard it is to connect EHR data, etc.
And that is all pretty convincing, but I wanted to find a statistical 'proof' - something like what ultimately exposed the Wansink papers and other frauds. I wanted to find numbers that cannot exist. And so far, I haven't found any.
(That doesn't mean they aren't there, just that I haven't found them yet)
So there's another possibility that I want to discuss. What if there's a "real" (fake) dataset? This gets a bit weird to talk about publicly - sort of an "If I Did It" thing - but go with me on this...
It really isn't *that* hard to simulate data to have simple patterns that you want. And the easiest way to make these papers look convincing is to create a "real" (fake) dataset, then run "analyses" on all of the fake dataset, so they're internally consistent.
Now, a lot of folks are claiming that the solution here is "open data" - figuring that if SSD is asked to hand over the data, he just won't agree, and that's the end of the story. He won't produce the data because the data don't exist.
But...what if he does produce the data? What then?
Playing this scenario out, he can turn over the "real" (fake) dataset, say that he just needed time to make sure that it was properly de-identified and had all legal agreements or whatever, and then no amount of statistical forensics will prove that it didn't exist.
So I am a bit stuck. There seems to be an assumption that asking him to provide the data will be game over because he cannot produce the data. I don't think that is a stone-cold lock, either.
And, since I'm one of the less-pro-open-data-than-much-of-academic-Twitter, I do feel compelled to point out that open data (while it has presumed advantages) is not going to be the foolproof solution, either.
Suppose that SSD provides the "real" (fake) dataset. Do we have to take him at his word, then? Is that game over in the other direction - the data exist, ergo they are trustworthy?
Personally, to be convinced that this is legitimate, I think there has to be some evidence that the database exists as described (e.g. not just "Here is a spreadsheet with all the right numbers in it") - just because it's *such* a fantastical dream to...
...most people with knowledge of hospital EHR systems and the legal complexities of arranging this across multiple continents. Getting 671 hospitals properly integrated into this system wouldn't just happen under the radar.
There have to be contracts, people who remember signing their hospital up for this database, IT professionals that worked on this, etc.
But that can get weird in another direction. Is it fair to demand that this specific author produce this? What if they claim this is all proprietary? Claim their is no obligation for them to share this? Ask if other studies will be held to the same standard?
(At this point I'm just kind of streaming consciousness, I don't have any good / simple / easy answers to this mess)
Anyways, trying to get back onto some coherent track here: the point is, it's not *that* hard to make up a "real" (fake) dataset, and then do all the analysis on it, which covers you from one perspective. So how can this be conclusively proven to exist (or not exist)?
Like I said above, one useful piece of evidence would be some hospitals actually saying "Yes, we have heard of this company and been part of their database since XXXX"
Another thing I've been thinking: people that are telling the truth will not mix up their story. If they are telling the truth, everything stays consistent. A good way to catch a liar is to keep them talking.
At some point you'll ask a question they did not expect. Something they didn't think of in their plan, which will trip them up when they try to explain.
And here's where we come to an interesting point that several others have picked up on. This is where I hope whoever does the audit for @NEJM and @TheLancet are paying attention.
Table S2 in the Lancet paper. Shows the breakdown of race categories for the entire N=96,032 dataset.
Table S3 in the Lancet paper, summary stats broken down by continent. No race variable. Where did it go? Why would you produce this big massive summary table and omit just that variable?
As my theory goes (and I emphasize that this is a THEORY, something I am trying to work out as I cross off all of the possibilities) - they made up a "real" (fake) dataset. And frankly, they were pretty good at it. But:
The thing they didn't think of: they didn't think to make the breakdown of the "race" variable look sensible by continent. They may not have expected to be asked to show a breakdown by continent at all (this could also explain the initial "Australia" gaffe)
So with their big "real" (fake) dataset, all of their tables looked pretty good, and then when they were asked to produce a breakdown of summary stats by continent, they realized this was a problem. So they just hoped to take race out of this (as others have noted...
...it would be very strange for dozens of countries from different continents to all record "race" data in their EHR using conveniently American-research-style "race" groups)
Anyways, the lesson I am slowly and haphazardly getting to is that this story doesn't just end if he's able to cough up a dataset. Keep them talking. Ask questions about the stuff they might not have thought of.
Because, as I said above, it's actually pretty easy to create *a* fake dataset. It's harder the more things you have to get consistent and correct and aligned with reality.
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Andrew Althouse

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!