Health Nerd Profile picture
17 Sep, 25 tweets, 5 min read
It's been coming up a lot lately, so I thought I'd do a bit of a thread on CONVENIENCE SAMPLES and why they aren't great for assessing POPULATION PREVALENCE of a disease

In other words - how many people have had COVID-19?

2/n So, the basic idea here is simple. We want to know about people who have (or in this case, have had) a disease

How do we find that out?
3/n The traditional method is to do a large, randomly-sampled study involving dialing up 10,000s of people across a population and surveying them + doing lots of blood tests

But this is EXPENSIVE
4/n Running a proper statistically representative process, getting all the people to answer their phones and give you bloods...even if the cost per person is low, multiply that by 10-100,000 and the cost can be prohibitive
5/n Which brings us to the idea of a CONVENIENCE SAMPLE

Why is it called a convenience sample (hint: answer is in the name)
6/n Yes, convenience samples are just that - convenient

Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
7/n I have used this method in the past to look at the burden of diabetes in-hospital and GP clinics - we looked at people who were already getting blood tests, and added one extra test for diabetes (and science!)…
8/n But there's an issue here

We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
9/n So in our lovely study of a convenience sample of diabetes tests, we can't say anything about how much diabetes there is in the community (population prevalence)!

All we can talk about is diabetes IN THE PATIENTS TESTED
10/ "But God", you ask, with a common autocorrect mistake, "what does this have to do with COVID-19?"

Well, reader, this is where we get to antibody testing
11/n You see, when you get sick with a new disease, your body produces antibodies*

We can then test for these antibodies to see if you've had the disease before*

*oversimplified, plz don't murder me immunologists
12/n If you run an antibody test on a large group of people, it's called a serosurvey (because antibody tests are also known as serology in sciency terms)
13/n Now, a lot of places (countries, states, colleges) have run serosurveys and had a grand old time of it. This is why you keep seeing those news articles saying that x% of people in a place have had COVID-19 already
14/n The problem is, some of these serosurveys used CONVENIENCE SAMPLES

Just like we discussed earlier, that makes them a bit problematic
15/n My co-authors and I, in our systematic review of age-stratified IFRs for COVID-19, looked into just how problematic

The answer: a whole lot…
16/n For example, one study in Tokyo that used a CONVENIENCE SAMPLE found that 3.8% of people had had COVID-19 in the sample tested

But a proper randomized sample found just 0.1% - 38 times lower!
17/n In England, a CONVENIENCE SAMPLE of blood donors implied that 1 in 12 people had had COVID-19, but a large representative sample found it was just 1 in 20
18/n The problem is, these CONVENIENCE SAMPLES are systematically biased. They are of people who are different to the general population in ways that can be very difficult to measure and/or understand
19/n Blood donors, for example, are young and healthy by design. But the people who have been (generously) giving blood during the pandemic might also be...well, a bit odd
20/n They're going to great personal lengths to sacrifice for the rest of us ungrateful buggers, which might indicate that they're more likely to socialize, more likely to mingle, and thus more likely to get infected

21/n And this is the problem with convenience samples, generally

We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren't representative of society as a whole
22/n So if you see a headline that says "x% of people infected with COVID-19!" take a leaf out of my mentor's book and ask:


It's a vitally important question

I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS

They also provide a brilliant window into change OVER TIME
24/n For example, if you sample blood donors every week for a year, you've got an amazing insight into the changing nature of the pandemic

25/n You just can't use those results to tell how many people in the rest of society have gotten COVID-19

But that doesn't mean the results aren't helpful at all

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Health Nerd

Health Nerd Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GidMK

22 Sep
One of the weirdest parts of the pandemic has been watching economists, who I always assumed could read and understand graphs, fail to read and understand graphs
If nothing else, without limiting the deaths from all causes to countries with active outbreaks this makes absolutely no sense whatsoever
For another, the source appears to be...a climate scientist from France? How did this person get accurate death data for worldwide fatalities with no lag? Or are they using some historical dataset or extrapolation?
Read 4 tweets
22 Sep
As a resident of NSW, have to say this is a remarkable achievement. Well done @NSWHealth!
For context for international followers, the state managed to get down to zero locally acquired cases while RELAXING restrictions through a test/trace approach and some health promotion

Quite remarkable
Worth noting that this is just one day, chances are there's still ~some~ local transmission, but to have the numbers go from doubling within a week to 0 is still quite impressive!
Read 4 tweets
21 Sep
Perhaps unsurprisingly for a blog called "lockdown skeptics", this piece makes basic mathematical and epidemiological mistakes. In fact, very few positive COVID-19 tests are false
The basic error presented here is the assumption that all PCR tests are run on a random population sample of the UK, for which the prevalence is 1/1000

This is inaccurate
Most PCR tests in the UK (and everywhere) are run on the SUSPICION of COVID-19

Read 9 tweets
21 Sep
A study recently made massive, international news for reportedly showing that normal glasses may protect against COVID-19

Let's do a brief peer-review on twitter, because this is wild 1/n
2/n The study is here. Altmetric of >2,000, hundreds of news articles about it already…
3/n The basic idea of the study is simple - we know that COVID-19 can be spread through droplets. Sometimes these droplets might go into eyes. Wearing glasses might prevent this, so do people who wear regular corrective glasses get COVID-19 less than people who don't?
Read 15 tweets
20 Sep
Lots of people have asked the question: why is COVID-19 more fatal in one place than another?

Our paper largely answers this question - it is mostly explained by age!…
For example, in the US:

Utah has the lowest IFR in the country, with our estimate putting it at exactly 0.5%

Indiana has a much higher IFR, at roughly 1.1%
But this is LARGELY explained by differences in the age breakdown of infections - in Utah ~50% of all infections were in people <45yo when we ran our analysis compared to 40% in Indiana
Read 7 tweets
17 Sep
The CDC estimates that the SYMPTOMATIC CASE-FATALITY RATE (CFR) for influenza is ~0.1%

The estimate of the INFECTION-FATALITY RATE (IFR) is closer to ~0.05%, due to asymptomatic flu cases ImageImage
It's VERY HARD to compare the CFR of influenza to the CFR of COVID-19, because the DENOMINATORS (number of people tested) are very different

But we CAN compare the IFRs
The POPULATION IFR of COVID-19 is (very crudely) 0.5-1%, although this varies enormously with the age breakdown of people infected by the disease

So, about 10-20x higher than seasonal influenza
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!