Others have worked on this problem, but there's a fundamental issue with trying to quantify variants: the mutations that define variants are so far apart that they never appear on the same read, and often not even on the same molecule(!) in heavily degraded wastewater RNA 2/
Luckily @jasmijnbaaijens had a critical insight: this is computationally identical to RNAseq quantification! You have an unknown mixture of transcripts (known variants) which you've chopped up and noisily turned into sequencing reads, and are inferring the original mixture 3/
When we tested using kallisto for prediction on simulated data, it worked really, really well. There are some systematic over-and-under estimates, which speaks to the importance of choosing your reference set well, but otherwise, it works 4/
At the same time, Alessandro Zulli in @jordan_peccia's lab was sequencing from wastewater, and @IsabelOtt and @NathanGrubaugh's crew were looking at variants in clinical samples in the exact same area. A perfect case to try it out! 5/
It worked pretty well! Importantly, though, we found that quantification of individual wastewater samples appeared noisy, but regional trends in clinical abundance were well captured from the wastewater 6/
We then worked with @cduvallet and team at @BiobotAnalytics, to test our technique on samples from sixteen sites across eight states sequenced with a different workflow. Again, quite noisy, but the point is that this is a fairly general technique that works on many data types 7/
As you might expect, it’s important that the wastewater has low Ct for sequencing to have enough molecules to capture enough diversity to estimate variant abundance. From testing we found that while the accuracy remained decent with low coverage, precision dropped off. 8/
Whether the discrepancy between the wastewater and clinic is noise from the inherent noisiness of wastewater, or noise in clinical sequence data, it’s impossible to tell. But it does seem that computationally this is about as good as we’re going to do. 9/
So what do we learn from this? You can see SARS-CoV-2 variants in wastewater, but it does seem like clinical sequencing gives a more accurate and earlier picture of what’s happening. For places without robust clinical sequencing, this could help pandemic surveillance. 10/
While right now the US at least is all Delta all the time, we all know that may change again. And in any case, this is far from the last pandemic we will see, and far from the last time we will need to monitor real-time pathogen evolution on a population scale. 11/11
Coda, since it's come up: One of the key things we did to make this work was to include multiple reference sequences per variant lineage. That way instead of being thrown off by natural diversity within the lineage, that diversity actually helped make predictions even better!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I don't think it's widely appreciated how incredible an achievement this is. Biotechnology has advanced unbelievably in the last fifteen years, but even still, going from new virus to completed phase 3 clinical trials in eleven months is like... I can't come up a good metaphor
Maybe announcing a Mars program and landing a crew twelve months later? It's certainly on par with the Manhattan project.
Though the dissonance of this incredible technical achievement against the tragedy of our utter failure of public health leadership and policy is jarring
Excited to share my latest preprint with @LeeKShaffer and @BillHanage, “Perfect as the Enemy of the Good: Using Low-Sensitivity Tests to Mitigate SARS-CoV-2 Outbreaks” in which we show how the math of superspreading events can improve contact tracing 1/ dash.harvard.edu/handle/1/37363…
The key idea is: if A is sick and has contacted B, B is probably still fine, but if you also know that A has infected C then there's a much better chance that B has been infected. Superspreading (or overdispersion) means that infection _events_ are correlated 2/
Here's where testing comes in: because B and C being infected is correlated, you don't need a test that gets both of them right all the time. Either one testing positive gives you information. So a low sensitivity test on all of A's contacts is almost as good as a perfect one. 3/
I see a lot of motivated reasoning as to why this can't be as bad as serious models predict be without massive societal action. And I know these are desperate attempts to reason why the world must be similar to past experience, but it's hard to be sympathetic.
But all those thinkpieces two or three weeks ago, what did they accomplish? They sowed just enough doubt to slow action (and apparently some Medium posts got the ear of the White House). And now we are seeing the tragic consequences of insisting the world must be as you hope.
The curve is bending, our fates are not fully sealed. Hold the line. Keep distancing, be safe to reduce the background rate of hospitalization. Listen to epidemiologists with the experience to understand the complications and nuance. This will be a long fight.
I just did an updated calculation of what happens to America if we do nothing. And it is nothing short of terrifying.
The current rate of spread is a near-perfect exponential. If we do not change our behavior dramatically and fast, here is what the math says: 1/n
The last eleven days give a remarkably good fit for linear regression on the log cases (R^2=0.9981), that's good enough to project the exponential. Here's what happens:
~March 18th the US passes 10k cases
~March 26th we pass 100k cases
~April 4th the US passes 1 million cases
If 10% require hospitalization:
~April 5th the US passes 1.6M cases, or 10x the number of ventilators
~April 11th we 9.24M cases, or 10x the total hospital beds
If we do not change our behavior, by early April the entire US medical system will be treating critical Covid-19 cases
The reason to cancel meetings and seminar visits is the same reason we have them in the first place: by establishing long-distance connections and high-connectivity nodes, we help ideas spread much faster through our social networks. It's the same for a virus.
More math: Locally, early in an outbreak, the expected impact of a social event scales as the number of people times the number each interacts with. Roughly the attendance squared.
Therefore cancelling a 50 person event is over a 1000 times as important as cancelling a 1-1.
Of course this eventually becomes linear, at a 10,000 person event you can't possible interact with everyone. But the point remains that from an outbreak-spreading perspective large events are disproportionately more important than small ones.
It’s the season for grad school interviews. I’ve been doing these a couple years now (for a few different programs), and in the interest of dismantling the hidden curriculum, here’s how I’d interview you and what I’d look for: 1/
(Before I go on I want to emphasize that this is just how one person at one school does interviews. It is not universal, and you should take this as a data point and nothing more.) 2/
Most importantly, I want to see that you are prepared for the rigors of grad school This means that you have the academic training (and if you made it to the interview you almost certainly already do) to not be overwhelmed and the motivation to push through the difficulties. 3/