In today's much-discussed @nytimes story from @apoorva_nyc (nytimes.com/2021/05/03/hea…) there is a graph that I find quite problematic. It purports to show county-level data about vaccine hesitancy. Image
But look at how sharp those state boundaries are.

One of the key insights from our @callin_bull course is that in the real world, data are messy. And if they come out too clean, something is wrong.

This one screams that something is wrong. Image
So what's going on here?

The @nytimes graphic appears to come from this HHS/CDC report.

aspe.hhs.gov/pdf-report/vac… Image
Basically what this is doing (and I don't have a complete grasp, so I'm open to corrections) is taking what I believe to be state-level survey data and then interpolating county-level responses based on a demographic model.
While this might be the best-you-could-do for certain modeling exercises, in my opinion it's irresponsible when used for data visualization in a public-facing forum.

The reason is that it provides a false impression of certainty and resolution.
The visualization makes it look like the data have been compiled (or at least are accurate) at the county level.

Not at all—and we see this in the huge state-boundary effects.
In my view, presenting synthetic county-level data in this way is straight-up *visual bullshit*.

Bullshit, in our @callin_bull definition, tries to impress or persuade with a disregard for the accuracy of the information conveyed.

This seems to fit the bill perfectly.
To better understand what is going on here and how the model that generated these data works, take a look at this thread. (h/t @bhrenton).

So it looks like I missed the boat in my criticism of this graph.

No, I wasn't wrong. Everything I said was correct.

But there is a WAY bigger problem with these data.

@NatMakesMaps lays it out. I believe he is correct.

As best as I can tell, the @nytimes managed to double-count the people who are strongly vaccine-opposed, leading to massive overcounts of vaccine hesitancy in the US and maps that are pure nonsense instead of merely under-supported.
According to @NatMakesMaps, the map should look like this. Is that correct, @nytimes? If so I eagerly await your correction notice. If not, I await your explanation of the anomalies that Nat points out. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carl T. Bergstrom

Carl T. Bergstrom Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @CT_Bergstrom

13 May
When you extrapolate from data about within-group values to the existence of between-group differences.

Via @MikeTaddow Image
(To explain a bit more, the people ranking BBQ joints in Seattle are not the same people ranking them in Brownsville, TX. These data tell us that Seattlites are nice when they rank things and/or have low standards for BBQ, not that we are a contender on the national stage.)
I now really want to see the rankings for best pizza, using the same absurd metric.
Read 4 tweets
12 May
Before anyone panics, note (1) the selection bias arising because this is exampled picked out of the various outbreak case reports as being "worrisome", (2) the small sample size, and (3) these numbers still give you a point estimate of 84% effectiveness against infection.
In a bit more detail: As small case clusters arise and are reported worldwide, we expect to see a distribution of effectiveness estimates. Some will have more vaccinated cases by chance, some fewer. The smaller the clusters, the wider the distribution.
Singling out a small cluster that yields a low effectiveness estimate for some variant of concern—and ignoring all the other data on that variant of concern everywhere else in the world—is reckless, and, odds are, misleading.
Read 6 tweets
30 Apr
Today a story has been going around about a cluster of B.1.617 cases in Israel. This is the India-associated strain.

Unfortunately, this is in some places being spun as a possible example of vaccine escape. But the numbers suggest exactly the opposite!

timesofisrael.com/children-from-…
Here are the numbers.

24 with recent travel history.
17 with no travel history
5 children
4 vaccinated

Approximately 85% of the adult population in Israel has been fully vaccinated. So what does this tell us about vaccine effectiveness against B.1.617 in adults?
I'll just do point estimates.

Assume the 5 children were <16 and thus unvaccinated.

That gives us 32 cases among unvaccinated adults, and 4 cases among vaccinated adults.

The basic calculation for effectiveness then gives us a remarkable 98% against B.1.617.
Read 6 tweets
27 Apr
1. Today’s antivax propaganda comes from a….vaccine manufacturer?

Unfortunately, yes. The manufacturer of the Sputnik V vaccine is tweeting absolutely nonsense statistics in an effort to question the safety record of its competitors. Image
2. Their unfounded claim is that we are observing higher death rates among Pfizer recipients.

This is rubbish. In our book, we address the way in which people will try to bamboozle you with the unwarranted authority of numbers by throwing lots of stats at you.
3. But statistics (1) are only as good as the methods used to derive them, and (2) are only useful when they allow you to make fair and meaningful comparisons.

The Sputnik V numbers fail spectacularly on both accounts.
Read 14 tweets
27 Apr
Osprey and dinner
Crows arrive on the scene.

"Wait, how much will you give me if I ride him?
The approach.
Read 7 tweets
24 Apr
Genomics and the poetry of racist injustice:

Let's start with the poetry, because if you read that, it doesn't matter one iota whether you make the connection to genomics.

Please, please take a moment and read this. Slowly, aloud, and more than once.

newyorker.com/magazine/2020/…
What does this have to do with genomics?

To pack a huge amount of information into very small genomes, viruses make use of overlapping reading frames. From Bergstrom and Dugatkin (2016), the HBV genome:
We present an extremely stupid example of what this would look like using three-letter English words instead of codon triplets.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(