Tweet

Youyang Gu

29 Dec, 18 tweets, 7 min read

A lot of people have asked why the CDC estimates close to 100M total US COVID-19 infections (28%) by Dec 1, while covid19-projections.com only estimates 58M (17%).

I believe there are major flaws in the CDC estimates, which I will explain in this thread.

cdc.gov/coronavirus/20…

To begin, the covid19-projections.com model is tuned on serology surveys, while the CDC model is not.

While CDC estimates 7x more COVID-19 infections than reported, covid19-projections.com estimates this ratio to currently be ~3x, down from 10x in April and 4x in the summer.

Using the CDC claim that "1 in 7 total infections were reported", this would imply that 70% of North & South Dakota were infected, which doesn't pass a common sense test.

While a 7x multiplier is believable in the spring, the paper still claims this is the case in September.

The main issue with the CDC paper results lies with their estimate of non-hospitalized infections. Let's take a look.

They claim that 26-40% of non-hospitalized, symptomatic individuals sought care/testing. And out of those individuals, only 43-58% completed a COVID test.

First of all, the 26-40% of individuals that seek care is based off of a third-party data source called COVID Near You.

It is entirely self-reported, and over the past 3 weeks, only 402 people reported COVID-19 symptoms. In that time span, there were 4M cases. So coverage=0.01%.

Next, the 43-58% test completion percentages are only based on *outpatient emergency department visits*.

The paper makes a questionable logic jump that *all* non-hospitalized symptomatic individuals who *already sought care/testing* complete tests at the same low 43-58% rate.

We cannot use the testing completion percentage of emergency department visits as a proxy for the fraction of all symptomatic individuals who completes a COVID-19 test.

If my interpretation is correct, I believe this fundamental flaw compromises the result of the paper.

The paper's test completion percentage estimate is also based solely on a private dataset from IBM Explorys, making the results unreproducible.

It doesn't disclose how many data points were used to generate the estimates. For all we know, it could be 100 or 100,000.

Furthermore, the ranges for the test completion percentages are so wide they're practically meaningless.

For example, the paper claims the test completion percentage of non-hospitalized individuals between ages 18-49 can be anywhere between 6% and 99%. Not very helpful...

To finish the calculation of computing the detection rate, the paper multiplies the care/test seeking rate with the test completion percentage, and factors in a ~11% false negative rate/0% false positive rate, to get a detection rate of ~15%, or 1 in 7 infections.

It appears that the CDC site used this 7x multiplier and applied it to the number of confirmed cases at end of November (13 million) to get 91 million total infected.

These estimates were then picked up and disseminated by the public & media (understandably, since it's the CDC).

In reality, if an individual reported to have visited a testing center, the test completion percentage is likely close to 100%, not 43-58%.

Hence, the multiplier should be lowered by a factor of 2, from ~7 to ~3.5, making it consistent with the covid19-projections.com model.

The paper also claims that only 1 in every 2.5 COVID-19 hospitalizations are reported as COVID-19. Hence, they implicate that the majority of COVID-19 hospitalizations go unreported, even in September.

That also seems unlikely, but that is a separate discussion.

The entirety of these results hinges on data of unknown quality from two third-party data sources which is not available publicly and thus irreproducible.

Furthermore, one of the data sources is composed entirely of self-reported data from an online, non-random population.

If my interpretation is correct, I'm surprised this paper passed peer review and is now featured on the CDC website.

These results have major problematic implications. E.g:

- A much lower IFR (0.4%)
- 40% total infected currently (50% by February)

Even though some people want to believe in those implications, they are likely not true, at least based on the current data

I hope this adds more clarity to why covid19-projections.com estimates differ from the CDC. If new evidence comes to light, I will adjust accordingly.

With all the resources, data, & expertise at the disposal of the CDC, I hope to see more sophisticated methods used to estimate the true prevalence of COVID-19 in the US.

Ideally it would take advantage of all the serology surveys that the CDC has done: cdc.gov/coronavirus/20…

With all that said, the CDC has some of the world's brightest scientists and experts. I learned a lot from their work over the past year.

As with all science, new findings should be scrutinized and held to the same rigorous standards. That's always been my goal.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @youyanggu

Youyang Gu

@youyanggu

15 Dec

https://twitter.com/Bob_Wachter/status/1333966348972539904

If we vaccinate 10 million people today, statistically 300 of them will die the very next day. Regardless if they actually got vaccinated or not.

Over the next months, it's important to watch for misinformation that blames adverse events on the vaccine.

https://twitter.com/Bob_Wachter/status/1333966348972539904

https://twitter.com/Timcast/status/1337043496230932483

Below is an example of the misinformation that can spread.

The annual incidence of Bell's palsy is ~25 per 100k. There were 4 cases out of 40k participants.

The FDA concluded it's "consistent with the expected background rate in the general population."

https://twitter.com/Timcast/status/1337043496230932483

In statistics, this is a simple application of something called Bayes Rule.

In essence, we must consider the likelihood of an event happening independently.

For ex: a 90-year-old has a 1 in 6 chance of dying within a year. So this happening after a vaccine would not be unusual.

Read 4 tweets

Youyang Gu

@youyanggu

14 Dec

@Cat_Ho

Many people are unaware that the COVID-19 vaccine has significantly more side effects than the flu vaccine. I hope to see more honest discussions regarding this.

Props to @Cat_Ho for her realistic, data-centric reporting of this issue. It's much needed.

sfchronicle.com/health/article…

Some notable numbers from the vaccine trial participants after the 2nd dose (age 16-55):

- 16% developed a fever vs 0% for the placebo
- 59%/52% had fatigue/headache vs 23/24% for placebo
- 45% took pain medication vs 13% for placebo

Those age 55+ have a slightly lower rate.

https://twitter.com/JesseOSheaMD/status/1337091268866953219

In comparison, roughly 1% of flu shot participants report a fever (16x lower), and ~20% report fatigue/headache (2x lower).

On top of that, COVID-19 vaccine participants have to go through this twice. Though the side effects are milder after dose #1.

https://twitter.com/JesseOSheaMD/status/1337091268866953219

Read 12 tweets

Youyang Gu

@youyanggu

11 Dec

By many accounts, the US will have 100 million vaccine doses by February.

I estimated yesterday that we need ~100 million people to gain immunity via vaccination to reach herd immunity.

So *theoretically*, we can reach herd immunity by March if we vaccinate the right people.

This involves allocating the initial (limited) supply of vaccines based on two main criteria:

1) Each individual receives only one dose instead of two.
2) We prioritize individuals who have not had a prior infection.

This would be temporary, until supply catches up.

https://twitter.com/zeynep/status/1337047714341785605

There is some evidence, though inconclusive, that even one dose of the vaccine can have reasonable efficacy (potentially >80%).

https://twitter.com/zeynep/status/1337047714341785605

Read 15 tweets

Youyang Gu

@youyanggu

10 Dec

I launched a new page that shows the path to US COVID-19 herd immunity: covid19-projections.com/path-to-herd-i…

It's built on the assumption that herd immunity will be achieved via vaccination and natural infection.

Tl;dr version: I estimate a "return to normal" by June/July 2021.

The underlying methodology is a simple model that simultaneously simulates daily vaccinations and new infections through 2022.

By May/June 2021, I estimate vaccinations to exceed 1 million people per day as they become available to the general public.

By mid-summer 2021, I estimate roughly 1/2 of the population have been vaccinated & 1/3 of the population have been infected.

After accounting for overlap/loss of immunity, this amounts to ~60% of the population possessing immunity to the virus, sufficient for herd immunity.

Read 8 tweets

Youyang Gu

@youyanggu

9 Dec

The COVIDhub Ensemble model that combines all the models did not perform well over the past 2 months.

This is due to the fact that the majority of model submissions did not properly forecast this current wave.

Roughly half of all models failed to beat the baseline.

This is a known issue with pandemic modeling. For most scenarios, it's beneficial for models to make forecasts close to the status quo (since that's usually true).

This means the they're accurate a majority of the time, but they will miss large spikes such as this current wave.

On the flip side, if a model predicts a large spike and is wrong, it will be heavily penalized by most evaluation metrics. This can happen even if the spike does happen but is a few weeks early/late.

That's the dilemma a lot of modelers face, including myself earlier this year.

Read 7 tweets

Youyang Gu

@youyanggu

3 Dec

I posted the methodology for the new covid19-projections.com nowcasting model:

covid19-projections.com/estimating-tru…

I'm going to do a layman summary here, and hopefully receive some feedback from #epitwitter.

https://twitter.com/youyanggu/status/1291092311045283841

I've adjusted the methodology that I posted back in August based on new data and research:

https://twitter.com/youyanggu/status/1291092311045283841

Disclaimer: with that said, this is still a simple heuristic and hence is not perfect. There are more advanced methods (e.g. see covidestim.org).

The basic idea is this: for each day, we try to estimate the ratio of true infections to reported cases that day.

We call this the prevalence ratio, and we model this ratio as a function of the day and positivity rate:

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Youyang Gu

Try unrolling a thread yourself!

More from @youyanggu

Youyang Gu

Youyang Gu

Youyang Gu

Youyang Gu

Youyang Gu

Youyang Gu

Did Thread Reader help you today?

Like this author's thread?