Dave Blake, PhD Profile picture
Check me out at bsky. OMHIWDMB.

Sep 9, 2020, 12 tweets

A thread on the Sturgis study. I want to focus on one point - how can they find 266k cases are linked to Sturgis? Here is the study. 1/ ftp.iza.org/dp13670.pdf

Counties were categorized in terms of their inflow of pings to Sturgis. How many cell-phone pings (of origin in that county) occurred in Sturgis during the festival, compared to the prior two weeks? There were five different levels of inflow, in descending order. 2/

Next, relative, and absolute, high to low inflow counties had the LOG of their COVID19 cases plotted by weeks. All three of the highest ping county groups has increases starting about three weeks after the event. Low ping county groups had decreases. Changes were significant. 3/

Group boundaries were not uniform. The highest ping groups (outside SD) had 400+ pings but were only 7 counties. 30 to 400 pings were 526 counties. 20-30 pings were 216 counties. 10-20 pings were 437 counties. 1-10 pings were 672 counties, 0 pings were 1386 counties. 4/

For certain I would make the authors justify such apparently arbitrary boundaries. Make them equal population totals, or equal number of counties, or quintiles of pings, or anything that doesn't look so arbitrary. 5/

The bottom two groups are flat or declining, looks suspiciously like a p-hack to me. 6/

Nonetheless, the effect in the other counties are strong and really unprobable (p approaching zero) for the moderate high outflow counties. Highest inflow groups looks underpowered. Third highest a marginal statistically. But five groups, five time points, each with a p value..7/

Marginal hits are not good enough - experiment wide alpha demands p<0.002 by Bonferroni. Now let's move on - how do they reach the 266k conclusion? 8/

They simply apply the group mean change to the number of cases in each county. No worries here. The 266k however, stands on the dubious five group analysis with arbitrary boundaries. It really should not be that hard to regress based all counties involved. 9/

I would have done that first. They probably did also. With thousands of counties, there is lots of statistical power. There is some statistical power in there, but I am suspicious there is also some p-hacking and the 266k is accordingly inflated. 10/

Even so, I would think a VERY large number would be the answer from a more rigorous analysis. The weird grouping just causes me to SMH. 11/

The cost value leans on other economical analysis that says cases have total costs of $46k. No argument there. That's it. That's the tweet. 12/12

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling