Can that tell us anything about when it started in Wuhan?
🧵
Covid case numbers in Wuhan grew exponentially, up until the city was locked down on January 23rd:
You can fit an exponential curve to the data, and it follows a very simple equation. That curve says that cases were doubling every 3.8 days.
If you project this curve backwards, you get a starting date for the Wuhan outbreak in late November 2019.
(you have to make some assumptions about the case detection rate)
Since the doubling rate is close to 3.5 days, you can approximate this by just saying that covid case numbers double twice every week.
I offered this simplified table, in the Rootclaim debate.
This isn't exactly right -- the growth rate is probably a bit slower than this, case numbers would be more random at the beginning, and we might end up with half as many deaths by January 24th.
But I think this still gets you to within a factor of 2 of the correct numbers.
I didn't share this for utmost accuracy, I just presented this to help make this clearer.
Complicated simulations of the covid epidemic (like Pekar et al 2022) also place the start date in late November 2019.
Many people struggle to read papers on those computer models, but most people can multiply by 4, so this math should be easier to understand.
I still get some criticism from people that feel like this is just too fast.
Intuitively, they feel like the virus had to be in Wuhan for two months longer.
Therefore, maybe they think the Wuhan case data is fake?
Or maybe they think it's misleading, perhaps we're detecting a higher fraction of cases at the end, and less cases at the beginning, so it just looks like the pandemic is growing quickly.
One thing we can do is compare this to other early covid outbreaks. The first big European covid outbreak was in Lombardy, in Northern Italy.
Lombardy is a region, not a single city. The outbreak hit several cities across Lombardy.
But Lombardy does have a population around 10 million, which is fairly close to the population of Wuhan.
So, maybe we can still compare the case counts.
I gathered some data for both regions and compared the growth in cases:
I didn't make any careful effort to line these up, I just set the dates to match up when each place hits 100 cases per day. That's January 8th in Wuhan and February 18th in Lombardy.
After matching that one point, the curves otherwise just happened to line up quite well.
So it looks like there's nothing abnormal about the case data in Wuhan, it's just following a normal growth rate for the virus.
This study on Lombardy estimates the R0 of the virus is around 3.
Using an R0 of 3 in my epidemic simulations gives a doubling rate about the same as what was seen in Wuhan.sciencedirect.com/science/articl…
This is interesting because the Lombardy outbreak is spreading in several smaller cities at the same time, not just one big outbreak in 1 city:
I don't think there's any law saying that covid cases have to grow at exactly that rate, but it's interesting that they do.
I think the actual rate will vary by city density. It slowed down in 2020, as people reacted. But it also sped up, with new variants of the virus.
So, when was the first case in Wuhan, and the first case in Italy?
We can say when exponential growth started in Wuhan based on genetics.
When the virus first spills over, there's no diversity.
As more people get infected, there are more mutations.
Since we know how fast it mutates, we can work backwards to guess when it started.
Doing this math in Wuhan gives a time to most recent common ancestor (TMRCA) around December 11th.
That's also when we started finding patients at Huanan market. So, exponential growth in Wuhan most likely started at that market.
We can also use genetic diversity to figure out the TMRCA, in Italy.
In Wuhan:
TMRCA = December 11th.
One month later, on January 8th, you have 100 cases diagnosed per day.
In Lombardy:
TMRCA = January 20th.
One month later, on February 18th, you have 100 cases diagnosed per day.
This comparison makes the Wuhan data look normal.
And this helps clarify what we're talking about in Wuhan -- there was not some large epidemic in Wuhan, of which some small part was detected at Huanan market.
Rather, a small outbreak at the market grew to infect all of Wuhan.
That alone does not prove that the outbreak started at Huanan market.
There could still have been a few cases before the market outbreak, before the TMRCA. science.org/doi/10.1126/sc…
When was the actual first case in Italy?
Just like in Wuhan, that's hard to answer with certainty, there are lots of contradictory data points, because it's hard to retrospectively find covid cases and because some studies can give false positive tests.
In Wuhan, the first proven case was a vendor at the Huanan market, with symptom onset around December 10th.
But there were some papers that described early cases, and some scientists find that very suspicious:
Most of these early cases have since been ruled out, for various reasons:
But you've also got a study claiming that lots of people had Covid in Italy, back in September 2019:
You can basically just tell that result is wrong from looking at the numbers. 16% of people had covid, but then it went down to 0, and then back up to 16? pubmed.ncbi.nlm.nih.gov/33176598/
There's also one study finding covid in Italian wastewater in mid December.
Unlikely to be a valid result -- if there were enough covid cases for that, you'd also expect to find a growing outbreak.archive.is/h3JTN
There's a similar early wastewater sample from Brazil, and that one can be ruled out based on sequencing: sars2.net/origin.html#A_…
Other than the lack of recorded Italian cases in December or early January, you can also just look at Google search trends to say when a large outbreak had started in each country. onlinelibrary.wiley.com/doi/10.1002/al…
Finding the start of a growing outbreak is easy, based on techniques like TMRCA and case growth rates
In Wuhan, that was early December at the Huanan market. In Italy, it was late January
Finding the exact first case is harder. We can't prove who that was, in Wuhan or in Italy
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Early covid cases in Wuhan were centered on the Huanan market.
Is that because the pandemic started there?
Or were those cases just found with a biased search?
The 2021 WHO report mapped out the home addresses of covid cases from December 2019. China reported 174 cases in December 2019. 1/3 of them were linked to the Huanan market, the other 2/3 were not.
A 2022 paper by Michael Worobey and other scientists analyzed all these case addresses and noticed that even though many of these cases were not linked to the market (blue dots), they were still clustered around the market and were centered almost perfectly on it.
Interesting post from Scott Alexander about H5N1, which tries to estimate the odds of a pandemic happening next year: astralcodexten.com/p/h5n1-much-mo…
Some prediction markets have an H5N1 pandemic about about 35%.
Scott argued that the odds of it happening next year are probably closer to 5%.
I think that's probably the right order of magnitude. In an interview with Robert Wright, I guessed it was in the 5-10% range
We also talked about how this compares to the risk of an H5N1 pandemic coming from a lab
A thread on the raccoon dog DNA found at the Huanan market, and what those samples can and can not prove about covid's origins:
The story of how the raccoon dog DNA data came to light is almost as interesting as the data itself.
Huanan market was shut down at the end of 2019, after a novel virus (covid) was found spreading among the employees. Over the next few weeks, Chinese scientists took 900+ samples at the market.