It is getting tiresome watching media, government, and other Twitter folks present erroneous numbers for certain states’ #Covid19 positive testing percentage.
Many of the Johns Hopkins percentages are flat wrong, and it's the denominator problem (again).
🧵THREAD🧵
1/
First, it’s important to know that these aren’t "Johns Hopkins' numbers." Their testing data is from @COVID19Tracking, who gets it from state websites.
But there are 3 options from which to choose the test number. Here are the definitions of each, in CTP’s priority order:
2/
1. "Encounters"
CTP's definition:
3/
2. "Specimens"
CTP's definition:
4/
3. "People"
CTP's definition:
5/
“Encounters” is the preferred metric because it omits intra-day serial testing, but does not disqualify those who have tested previously.
Joe gets tested 3 times today? That counts as 1.
Joe got tested in April, June, and October? Those each count as 1.
Simple.
6/
“Specimens” is the next best metric. It counts intra-day serial testing as separate tests, and does not bar those tested once from ever counting again.
Joe gets tested 3 times today? That counts as 3.
Joe got tested in April, June, and October? Those each count as 1.
7/
“People” is least preferred. Once tested a single time—ever—someone no longer counts in this metric (except if positive, I believe).
Joe got tested in April, June, and October? Counts as 1 test *in April*.
Joe gets tested 3 times today? Nope, he already tested 6 months ago!
8/
For a good example of the disparity in the different metrics, let’s look at North Dakota, which reports all 3 and so makes for a good test case:
North Dakota has had 5,604 cases for the week ending yesterday. There’s our numerator.
Here are our potential denominators...
9/
Encounters: 49,376
Specimens: 52,095
People: 11,741 (!)
Anyone spot the outlier? But why so low? Because 36.3% of ND’s population has been tested before and will never count again. In the last week, 37,635 people who took tests *would not count*.
Let’s run the numbers...
10/
% Positive using Encounters: 11.3% (This is what Johns Hopkins reports for ND)
% Positive using Specimens: 10.8%
% Positive using People: 47.8%
One of those looks a bit worse to me—and this is the way Johns Hopkins reports 16 states!
11/
Here are the states that JHU reports using the “People” metric:
Some of this is on the states, since some states *only* post the People number (which is absolutely wild).
Those states? AL, AZ, IA, LA, OR, & VT
12/
10 states do report one of the other metrics, yet are still being reported with the People denominator by JHU. Why? Likely because @COVID19Tracking still hasn’t changed the number in their spreadsheet column titled “totalTestResults” to reflect Encounters or Specimens.
13/
Compare the yellow highlighted states (DE, FL & ID) with the green ones (IN & MA). The “totalTestResults” field uses “totalTestsViral” (which is Specimens) for the green, but uses “totalTestsPeopleViral” (which is People) for the yellow.
14/
Most data gatherers (like myself) use the “totalTestsViral” field to update their testing numbers. I’m betting that’s exactly what Johns Hopkins does. But CTP has not yet moved the Specimens or Encounters data to that column for several states.
15/
A few rogue states also do things their own way: FL, KS, and MS among them. MS lately just decided to stop updating its test count, which is why you might’ve seen a 100% positive from them recently. KS hides its specimen data a bit. FL provides a lot of data, but spread out.
16/
If we fixed the column issue, we would see the following percentages:
There are still some very high percentages above, of course, but not the eye-popping 30%+ absurd numbers that intelligent people just blast out there as if they couldn’t be bothered to check if this is an apples-to-apples comparison.
Frustrating, to say the least.
18/
So next time someone uncritically tells you that Idaho is over 1/3 positive, or SD is over 40%, or even "What's up with Pennsylvania and Florida in double-digits?", kindly redirect them to this thread.
19/
*Follow Up*
@COVID19Tracking has updated their FL reporting, so Johns Hopkins ought to follow suit soon (maybe tomorrow):
Ok--I said that I'd provide some more information on the issue of how I (and others) have reported testing numbers wrong in several states.
This is actually a pretty big deal, and could affect policy decisions with a cursory look at the wrong positive testing percentage.
1/x
People looking at the national snapshot often use @COVID19Tracking's data. I do. Their site provides easy-to-access data.
And they have updated testing data for several states in a new column on their spreadsheet, but many of us continued to use the "legacy" column.
2/x
For 24 states, using that column provides a *very* skewed version of daily testing numbers (some states more skewed than others), creating erroneously high *current* positive testing percentages.
3/x
For those who follow my daily data update, I have a major adjustment I need to make in several states. 24 out of the 50 states are being reported exactly like Florida in terms of test numbers and positive %--which is DEAD WRONG (and inflates % positive).
I had no idea it was this big a deal, but it's a very big deal. Here's a thread on why it's a big deal:
Bottom line: @COVID19Tracking, and everyone who builds graphs off of it, use data for "new tests" that does not include anyone who has ever been tested before!
CO, ND, SD, and a ton of others--their percent testing positive are simply nowhere near as high as I've been reporting.
Hey @ianbremmer--you're using incorrect data as the denominator. To be fair, FL puts it out on the dashboard, but using the difference in the daily "Total People Tested" number is flawed, and pretty obviously so. Here's why...
First, here's your tweet and graphic. The graphic is clearly using the day-over-day difference in the Total People Tested metric, which is what @COVID19Tracking uses (last I checked), and which is wildly inaccurate when seeking out current testing.
That number measures only the number of people tested *for the first time ever*. So anyone ever tested--in April, May, June, whenever--will not show up here if they test again tomorrow. This excludes tens of thousands of tests every single day.
The US's 7-day average number of tests as of yesterday (per @COVID19Tracking) was ~846K. The week before? ~688K. Of course, this is expected with increased testing as schools open up.
The 7-day average percent testing positive hit 4.74% yesterday—the lowest since mid-June.
2/x
Turning to state-level data, "more than half" of US states certainly did not show an increase in percent testing positive.
21 of them did. But even that number is misleading. It includes CT, NH, NJ, VT, and NM, all of which are below 3% positive (VT is 0.59%, up from 0.54%)
3/x
To give you an idea of the lag between when a Covid death occurs and when it is reported, let's look at today's report from the state of Florida, which reported 153 resident deaths on 9/16.
The picture shows "newly identified" deaths (today's) in darker purple:
So, of the 153 resident deaths Florida reported today:
27 of them occurred in the last week (17.7%)
26 of them 1-2 weeks ago (17.0%)
30 of them 2-3 weeks ago (19.6%)
20 of them 3-4 weeks ago (13.1%)
Which leaves 50 deaths that occurred more than 4 weeks ago (32.7%).
On top of normal reporting lag, backlog clearing (death certificate matching) and reclassifying what counts as a death enter the mix as well.
An example of the former is Virginia's reported deaths today (45) and yesterday (96). The VDH website provided the following disclaimer:
I made additional changes to Florida’s data to provide more accuracy than my previous change, and much more accurate than what @COVID19Tracking adds to the US total, which excludes from the denominator everyone who has ever tested from counting again in the current test number.
I reached out to @COVID19Tracking to pass along the pitfalls of using the day-over-day difference in "total people ever tested once" as the current number of people tested today. The worst part about using this as a denominator is that it gets worse over time.
As of today, there are 4.8M+ people in Florida who, if they are tested again tomorrow, next week, or next month, will not count using the less accurate data. With tens of thousands of them testing daily, it's really skewing the test numbers (and the resulting % testing positive).