Are high levels of existing COVID-19 population immunity in US counties associated with a lower infection rate in this current wave?

This thread contains my latest findings to this question.

Initial conclusion: No, there is practically no correlation.
The plot above shows the the percentage of the population infected before Sep 1 & after Sep 1 in each county (based on covid19-projections.com estimates).

The question is: can knowing the % infected before Sep 1 in a county predict the relative severity of this current wave?
When looking at all 3,000+ counties, the answer is no. There is practically no correlation (R^2 = 0.002) between the % infected before Sep 1 and after Sep 1.

So given a county, the COVID-19 prevalence before Sep 1 has no predictive value in determining the severity since Sep 1.
Perhaps there are small counties that may be outliers, so let's remove those.

Even if you ignore counties with less than 50,000 residents, the pattern is consistent. There is virtually no correlation.

The pattern seem to hold no matter what population thresholds you set.
In fact, in the plot above, the slope of the best fit line is positive, meaning that a higher prevalence in a county is correlated with an even more severe fall outbreak.

But of course, the correlation is very weak, so I would not focus on the slope too much.
The correlation improves marginally when you break it down by state. But the slope actually tends to be more positive, rather than negative.

Meaning higher prevalence before Sep 1 -> worse outbreak after Sep 1.

37 / 50 states have a + slope (plots of CA, TX, FL & NY below).
This suggests that within an individual state, counties that have low prevalence before Sep 2020 will continue to have relatively lower prevalence after Sep.

Ofc, there are many other factors at play that can explain this. But population immunity doesn't seem to be one of them.
Before doing this analysis, I would have expected a weak negative correlation between high existing COVID-19 prevalence and the severity of the current wave.

But the data does not support this hypothesis, and hence I must revise my prior and beliefs.
I urge others who still believe in the strategy of using natural herd immunity to control future outbreaks to look at the data and see for themselves. Current evidence does not support this strategy.

The raw data is available on GitHub: github.com/youyanggu/covi…
We see people cite Florida and California as two counterexamples, but they are outliers.

We should also look at Tennessee & Arizona, who are currently experiencing large outbreaks despite large summer waves. Or, Oregon & Vermont, where spread has consistently been contained.
That said, it is still important to do case studies on California and Florida so that we can better understand why Florida was seemingly able to "flatten the curve" for this current wave, while California was not.

This is despite Florida having significantly fewer restrictions.
I also tried using the population as the predictor and computed the correlation.

It seems that more populous counties had marginally smaller outbreaks (in relative terms) than less populous counties.

This also goes against the notion that population centers drive infections.
These are preliminary findings. I am in the process of examining other variables that can better predict the relative severity of the current wave (e.g. income, intervention level, weather, etc).

If you have any other promising variables (for which there are data), let me know.
UPDATE: I'm glad this thread spurred some discussion. Thank you all for responding. I have done some follow-up analysis based on some of the feedback. I am addressing the following:

- Using deaths instead of % infected
- Outliers
- Min threshold for % infected
- Non-linearity
1) Deaths instead of % infected

Deaths is not as good of a predictor because it lags by many weeks and small counties do not have many deaths.

If we use deaths for this analysis, we can see that the correlation is still fairly weak (with NYC as an outlier).
2) Outliers

Many people tried to refute my conclusion by using NYC as an example that "population immunity works".

But this exact analysis shows that NYC is an outlier, not the norm. We should not make broad generalizations based on outliers. Too many are making this mistake.
I highlighted the four counties in and around NYC in the plot below. They have high prevalence prior to Sep 1 and relatively low transmission since.

They appear to be outliers. Using those data points to support the claim that high prevalence leads to lower infections is flawed.
If you look at all US counties with more than 1M in population, you can see that the outliers even more clearly.

Once again, there does not seem to be a correlation between existing prevalence and the severity of the current wave.
Those that only uses NYC as an "example" are ignoring Miami-Dade. 6% of the entire county tested positive by Sep 1 (~25% infected).

One would expect Miami to be able to suppress another wave. But another 6% of the county tested positive since then.

Similar situation in Phoenix.
The purpose of this analysis is to look at *all counties*, not just a few cherry-picked ones.

So picking a single data point (like NYC) doesn't refute these findings. In fact, it only highlights the need for more generalizing studies to avoid falling into confirmation bias.
3) Threshold for % infected

Some people suggested added a threshold for % infected before Sep 1, as population immunity effects may not kick in until some sort of minimum is reached.

Below, I re-ran the analysis but only on counties with >20% infected before Sep 1.
It's possible that population immunity does not "kick in" until we get to 30-50%+. But there are simply not enough counties in this category to confirm this hypothesis.

In either case, as experts have been saying all along, the only path to herd immunity is through vaccination.
4) Non-linearity: Some people mentioned that it's possible that the relationship is not linear. That's true.

But for most practical purposes, if there is little to no linear correlation, it doesn't help much to start considering nonlinear ones, especially for a single variable.
In any case, here's the residuals for the linear regression. The residuals are fairly normally distributed, which is good.

There is some heteroskedasticity in the data (i.e. the variance is not evenly distributed across x). But that should not affect the bias.
You can also take the log of the input data to get a more homoscedastic plot. But as you can see, the correlation remains poor.
That's it for this update. I'm always open to more suggestions.

By the way, if you follow me because I present an unbiased analysis, then please don't be surprised/dismissive if not all my findings agree with your beliefs/priors. Nobody is right all the time, including myself.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Youyang Gu

Youyang Gu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @youyanggu

5 Jan
Out of 137 US counties with >500,000 residents, the top five currently worst-hit counties are all in California.

We estimate 1 in 16 residents in those counties are currently infected with COVID-19.

See more in our new US County Summary page: covid19-projections.com/infections/sum…
So how does LA now compare to NYC at its peak?

We estimate NYC had 1 in 8 residents infected at its peak in March 2020. So LA is currently at half of NYC's peak.

Deaths are likely lower than half due to a lower IFR than in the spring.
Read 5 tweets
29 Dec 20
A lot of people have asked why the CDC estimates close to 100M total US COVID-19 infections (28%) by Dec 1, while covid19-projections.com only estimates 58M (17%).

I believe there are major flaws in the CDC estimates, which I will explain in this thread.

cdc.gov/coronavirus/20…
To begin, the covid19-projections.com model is tuned on serology surveys, while the CDC model is not.

While CDC estimates 7x more COVID-19 infections than reported, covid19-projections.com estimates this ratio to currently be ~3x, down from 10x in April and 4x in the summer.
Using the CDC claim that "1 in 7 total infections were reported", this would imply that 70% of North & South Dakota were infected, which doesn't pass a common sense test.

While a 7x multiplier is believable in the spring, the paper still claims this is the case in September.
Read 18 tweets
15 Dec 20
If we vaccinate 10 million people today, statistically 300 of them will die the very next day. Regardless if they actually got vaccinated or not.

Over the next months, it's important to watch for misinformation that blames adverse events on the vaccine.

Below is an example of the misinformation that can spread.

The annual incidence of Bell's palsy is ~25 per 100k. There were 4 cases out of 40k participants.

The FDA concluded it's "consistent with the expected background rate in the general population."

In statistics, this is a simple application of something called Bayes Rule.

In essence, we must consider the likelihood of an event happening independently.

For ex: a 90-year-old has a 1 in 6 chance of dying within a year. So this happening after a vaccine would not be unusual.
Read 4 tweets
14 Dec 20
Many people are unaware that the COVID-19 vaccine has significantly more side effects than the flu vaccine. I hope to see more honest discussions regarding this.

Props to @Cat_Ho for her realistic, data-centric reporting of this issue. It's much needed.

sfchronicle.com/health/article…
Some notable numbers from the vaccine trial participants after the 2nd dose (age 16-55):

- 16% developed a fever vs 0% for the placebo
- 59%/52% had fatigue/headache vs 23/24% for placebo
- 45% took pain medication vs 13% for placebo

Those age 55+ have a slightly lower rate.
In comparison, roughly 1% of flu shot participants report a fever (16x lower), and ~20% report fatigue/headache (2x lower).

On top of that, COVID-19 vaccine participants have to go through this twice. Though the side effects are milder after dose #1.

Read 12 tweets
11 Dec 20
By many accounts, the US will have 100 million vaccine doses by February.

I estimated yesterday that we need ~100 million people to gain immunity via vaccination to reach herd immunity.

So *theoretically*, we can reach herd immunity by March if we vaccinate the right people.
This involves allocating the initial (limited) supply of vaccines based on two main criteria:

1) Each individual receives only one dose instead of two.
2) We prioritize individuals who have not had a prior infection.

This would be temporary, until supply catches up.
There is some evidence, though inconclusive, that even one dose of the vaccine can have reasonable efficacy (potentially >80%).

Read 15 tweets
10 Dec 20
I launched a new page that shows the path to US COVID-19 herd immunity: covid19-projections.com/path-to-herd-i…

It's built on the assumption that herd immunity will be achieved via vaccination and natural infection.

Tl;dr version: I estimate a "return to normal" by June/July 2021.
The underlying methodology is a simple model that simultaneously simulates daily vaccinations and new infections through 2022.

By May/June 2021, I estimate vaccinations to exceed 1 million people per day as they become available to the general public.
By mid-summer 2021, I estimate roughly 1/2 of the population have been vaccinated & 1/3 of the population have been infected.

After accounting for overlap/loss of immunity, this amounts to ~60% of the population possessing immunity to the virus, sufficient for herd immunity.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!