Having one of those mornings where you realize that it's sometimes a lot more work to be a good scientist/analyst than a bad one.

(Explanation coming...)
Processing some source data that could just be tabulated and summarized with no one the wiser, thereby including some obviously impossible data points, e.g. dates that occurred before study began, double-entries, things of that nature.
Not exactly an original observation here, but when we talk about issues with stats/data analysis done by non-experts, this is often just as big of an issue (or a bigger issue) than whether they used one of those dumb flow diagrams to pick which analysis to do.
It would be *so* easy to just blow right past the meticulous double checking for duplicate entries, impossible dates, and go straight to running summary stats and models. And I'm guessing that's often what happens. Almost no way that's ever actually picked up later.
I'm not sure what to do about this other than tell people "do careful checks of the source data and cleaning and processing steps en route to creating your final analysis dataset." But please, if you analyze data, do this.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrew Althouse

Andrew Althouse Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ADAlthousePhD

16 Oct
As promised last week, here is a thread to explore and explain some beliefs about interim analyses and efficacy stopping in randomized controlled trials.
Brief explanation of motivation for this thread: many people learn (correctly) that randomized trials which stop early *for efficacy reasons* will tend to overestimate the magnitude of a treatment effect.
This sometimes gets mistakenly extended to believing that trials which stopped early for efficacy are more likely to be “false-positive” results, e.g. treatments that don’t actually work but just got lucky at an early interim analysis.
Read 50 tweets
21 Aug
OK. The culmination of a year-plus, um, argument-like thing is finally here, and it's clearly going to get discussed on Twitter, so I'll post a thread on the affair for posterity & future links about my stance on this entire thing.
A long time ago, in a galaxy far away, before any of us had heard of COVID19, some surgeons (and, it must be noted for accuracy, a PhD quantitative person...) wrote some papers about the concept of post-hoc power.
I was perturbed, as were others. This went back and forth over multiple papers they wrote in two different journals, drawing quite a bit of Twitter discussion *and* a number of formal replies to both journals.
Read 27 tweets
31 Jul
Inspired by this piece which resonated with me and many others, I'm going to run in a little different direction: the challenge of "continuing education" for early- and mid-career faculty in or adjacent to statistics (or basically any field that uses quantitative methods).
I got a Master's degree in Applied Statistics and then a PhD in Epidemiology. The truth is, there wasn't much strategy in the decision - just the opportunities that were there at the time - but Epi seemed like a cool *specific* application of statistics, so on I went
But then, as an early-career faculty member working more as a "statistician" than "epidemiologist" - I've often given myself a hard time for not being a better statistician. I'm not good on theory. I have to think really hard sometimes about what should be pretty basic stuff.
Read 22 tweets
4 Jun
As more stuff continues to break on the @NEJM and @TheLancet papers using the Surgisphere 'data' there's another possibility which has occurred to me that I want to play out.
I've been poring over these numbers for a few days and have not yet found a purely "statistical" smoking gun: a mean that cannot exist, a confidence interval that can't exist, etc.
Thus far most of the prevailing sentiment that this data isn't real seems to come from anecdotal beliefs: not very much evidence that the company exists, insider knowledge of how hard it is to connect EHR data, etc.
Read 31 tweets
2 Jun
Some excellent work here as more people pry into the @Surgisphere papers. I'm going to try to build on this a bit further...
Before we get started: many have pointed out some very legitimate reasons to be skeptical of how such a database could exist with so little record of the company's existence or infrastructure to support what would be an absolutely massive integration of EHR's around the world
Those are good points and people should continue to pursue them. I'm coming at this from another angle: I want definitive proof, or something like it, that these data cannot exist.
Read 53 tweets
1 Jun
Lots of questions being raised about @Surgisphere data analyses in @NEJM and @TheLancet. Others have already done some good work on this...
...so I'm going to focus on something curious that I noticed when I decided to actually read these papers instead of just skimming (will totally admit that I had not been paying very close attention to this until yesterday).
@Surgisphere is supposedly integrating data from hundreds of hospitals around the world, all different continents, that is supposedly The Very Biggest Data if you read most of their description.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!