My Authors
Read all threads
I don't think there's a way to say this diplomatically, but I think it's important to tell the truth:

I have zero confidence in the Santa Clara serology study, and the recent work of Eran Bendavid generally.
What do I see as the major problems?

1. Poorly chosen, non-representative samples

2. Weakness at thinking through important Bayesian principles

3. Inconsistency with other, more verifiable results

4. Attachment to a particular idea (slow willingness to update on evidence)
Point 1. The gold standard for selecting things is to take entirely random people whom you know nothing about--and especially, your sampling technique can't have any obvious relationship with the variable you're testing for.

This is, unfortunately, hard.
For example, unfortunately for scientists, the people who volunteer for a Covid serology testing project are more likely than non-volunteers to be interested in seeing their results. Perhaps because they think they had it.
This same problem also comes across in the Bendavid WSJ op-ed from last month; the remarkable stretch of applying the NBA's rate of positives to the country as a whole.

The NBA wasn't a randomly-selected employer; it was the first sports league to get a COVID positive.
Point two. The tests aren't perfect. Most tests give off some false positives and some false negatives.

Therefore, you need to be extremely mindful of a basic Bayesian principle, and especially so if those without COVID deeply outnumber those with it (and they do).
The principle is this: if negatives outnumber positives, and your test gives off some false positives, then your reported positives are actually still pretty likely--or even, very likely--to be true negatives.

(There's some math to calculate this exactly if you're interested.)
The test used in the Santa Clara study has been used on 401 true negatives (from pre-COVID times) and got 2 false positives.

This is a good record (99.5%) but unfortunately the sample size is not large enough to determine that for sure. It could be as low as 98.2%.
In the actual serology test of Santa Clara people, they got 1.5% positives.

See the problem here? Given what we know of the accuracy of the test, you could still get that 1.5% result even on an entirely COVID-free sample and it wouldn't be particularly surprising.
The lower bound here is much too high. (Some of the higher-than-expected value is population weighting, which is legit, but I cannot replicate the math that gets you a lower bound that high.)
Point three: this ongoing thesis of a huge, huge, huge number of untested positives, and a very low death rate--is inconsistent with much more salient and clear evidence, like the actual deaths in New York or Lombardy.
We have 12,000 COVID deaths in NYC, per this report from yesterday. Let's assume this is generally in the right ballpark.

If we take the estimate above--an 0.12% to 0.2% death rate from the virus--as a given, what number of total cases do we get in NYC?

nypost.com/2020/04/17/nyc…
The answer is you get 12,000/0.002, or 6 million, as a lower bound, and 12,000/0.0012, or 10 million, as an upper bound. And these numbers will presumably continue to rise with NYC deaths.

These results are--again, let me say what I mean--preposterous.
You could do similar calculations for Lombardy, Madrid, San Marino, or other hard-hit places.

The failure to grapple with any of these counterexamples--where the evidence on the ground is just obviously inconsistent with the study--does not inspire confidence.
Point 4. This one is perhaps meta-substantive. But IMO Bendavid is extraordinarily slow to update beliefs in the face of new evidence.

The March 24 op-ed posited a super-low mortality rate and predicts few deaths. Since then, the worldwide death toll grew by a factor of 10x.
To come back a month later and publish essentially the same viewpoint, just slightly less extreme--rather than going back to the drawing board and re-evaluating assumptions--is just contrary to the fundamental idea of hypothesis testing and Bayesian updating.
Meta-concern 1:

"Alan, you're not an expert."
True, but graduate-level statistics (or maybe even precocious undergrad) is all you need here. If you want I can link people in more-biological fields with similar critiques.

Evaluating arguments on merits >> credential games.
Meta-concern 2:

"Alan, this seems contrary to the norms of being nice and collegial"
Yeah. A few reasons for that.

First, because getting this right is important.

Second, collegiality norms can inhibit that: more mild-mannered communication may not get the message across.
People within a field often are constrained by collegiality. I'm seeing a lot of posts from more junior researchers about this study that read like "while I am glad to see serology brought into the fold, I just have this slight concern that..."

You see what they're doing.
They're being as harsh on it as they feel that they can, after the norms are taken into account.

I'm out of the field, I don't care whether people think I was too mean. So I can say it; the stats here are bad, folks. Really bad.
Final reason for being blunt rather than reserved: lots of public-facing people have faced criticism lately for not saying what they truly believe. (For example, "prepping" for themselves in February while "reassuring" the public simultaneously.)

So I'm saying what I believe.
Addendum 1: This is a really strong look at how to improve the confidence intervals and figure out what we can learn from the data.

I believe he's much closer to the correct statistical approach than the original study.

Addendum 2: Some replies are interested in my motives here. I'm not sure why, as presumably my arguments are of the same quality regardless of my preferences.

However, for what it's worth, I'm a middle-ground person, neither hostile to reopening, nor demanding it immediately.
Addendum 3: some other threads or posts I have found helpful. If it matters, these people are more qualified for traditional peer review of the paper than I am, and in some cases have additional points.





medium.com/@balajis/peer-…
Addendum 4: Lots of you asking why 10 million cases in New York City is a preposterous number.

The answer is that is greater than the total number of people. "New York City" here, in all uses, refers to combined New York, Bronx, Kings, Queens, and Richmond counties, not the MSA.
Addendum 5: "I think the authors owe us all an apology."

- Andrew Gelman, director of the Applied Statistics Center at Columbia University stat.columbia.edu/~gelman/
Sorry, linked his bio not his blog. The blog is here. statmodeling.stat.columbia.edu/2020/04/19/fat…
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Alan Cole

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!