Poorly designed quality assessment metrics are arguably even more objectionable than poorly designed primary methods.
Two examples of VERY poorly designed quality metrics are the Newcastle-Ottowa scale and SciScore.
Compare to something like ROBINS-I (which is really good).
One of these days I should really make a quality assessment metric assessment metric (or at least write about what makes the bad ones bad)
I'll give you a start though:
No quality assessment metric can ever determine that a study is high quality or good. It can only ever detect the set of issues it is designed to detect.
"Not found to be poorly designed" does not mean high quality, ever.
Additive scores (e.g. indexes) are always misleading if not outright false.
Items should strike at the core of the methods used to make a claim, not at shallow things easier to measure.
Ignoring subjectivity of assessment in favor of "objective" measurement is a peril.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
At the risk of getting involved in a discussion I really don't want to be involved in:
Excepting extreme circumstances, even very effective or very damaging policies won't produce discernable "spikes" or "cliffs" in COVID-19 outcomes over time.
That includes school policies.
"There was no spike after schools opened" doesn't mean that school opening didn't cause (ultimately) large increases in COVID cases.
Similarly "There was no cliff after schools closed" doesn't really mean that the school closure didn't substantially slow spread.
That's one of the things that makes measurement of this extremely tricky; the effects of school policies would be expected to appear slowly over time, and interact with the local conditions over that period of time.
Full disclosure: I contribute every so often to the NCRC team under the fantastic leadership of @KateGrabowski and many others, and have been a fan of both NCRC and eLife since they started (well before I started helping).
At some point I'll do a long thread about why this small thing is a WAY bigger deal than it sounds, but to tease: this heralds active exploration of a fundamental and long overdue rethinking and reorganizing of how science is assessed and distributed.
"Problems with Evidence Assessment in COVID-19 Health Policy Impact Evaluation (PEACHPIE): A systematic strength of methods review" is finally available as a pre-print!
One of the most important questions for policy right now is knowing how well past COVID-19 policies reduced the spread and impact of SARS-CoV-2 and COVID-19.
Unfortunately, estimating the causal impact of specific policies is always hard(tm), and way harder for COVID-19.
There are LOTS of ways that these things can go wrong. Last fall, we developed review guidance and a checklist for how to "sniff test" the designs of these kinds of studies. Check that out here:
Broken record here, but speaking as a scientist who deals primarily with strength/quality of statistical evidence, the crux for just about everything in science lies in philosophy.
Many, if not most statistical evidence failures come from ignoring it.
You don't need to read the complete works of 10k dead white guys, but it's incredibly valuable to dive down the "what does this even mean" rabbit holes.
Can't promise it'll make you more productive, but it will almost certainly make you a better analyst.
I am an amateur at sci phil, for what it's worth, but make sure to engage with those who know better to steer me in the right directions.
However, beware the "critical thinker" crowd. Often overconfident BS couched in pseudo sci phil. Hard to tell the difference.
A brief thread rant on woodworking and causal inference (yeah, you read that right).
From table legs to descriptive stats tables, from picture frames to the framing the big picture for studies. It's gonna get weird, but stick with me.
Let's say you want to make a very simple table. Easy! 4 legs cut to the same length and flat top. Step 1: cut those legs.
So, you take your leg material, and you carefully measure (twice) 26," mark it, and make your cut.
And no matter how careful you were, they don't match.
You might think that you didn't measure carefully enough, or cut straight enough. I promise that's not the problem.
The problem is that you were thinking about the problem the wrong way. Because unless you are a pro, measure twice cut once will NEVER get them to match.