Marc Johnson Profile picture
Oct 31, 2025 23 tweets 7 min read Read on X
Can you take a quarter cup of composite sewage, simply ask ‘what’s in there?’, and find out all of the pathogens circulating in that community?

That is the question we asked in our latest pre-print.

Turns out you can.
1/
medrxiv.org/content/10.110…
We are not the first group to do unbiased sequencing of wastewater to monitor circulating viruses, but I think we are the first to ever do it at this scale.

Weekly wastewater samples for 18 months, totaling over 85 Billion sequence reads.

2/ Image
Among the ‘known’ viruses, there was a fairly even split between bacteria viruses (phages) and eukaryotic viruses.
This was just raw reads though, if you look at diversity there was considerably more species of phages.
3/ Image
Focusing on the eukaryotic viruses, you see that the vast majority of the eukaryotic viruses are Virgaviridae, which infect plants.
4/ Image
What plant viruses you may ask? The most prominent one is a virus called Tomato Brown Rugose Fruit Virus (ToBRFV).

This was true everywhere in the country.

Americans eat a lot of tomatoes.
5/ Image
What’s sort of surprising is that ToBRFV isn’t even in US tomatoes.
That explains why we see it year round though. These are probably from imported tomatoes and tomato products.
6/
en.wikipedia.org/wiki/Tomato_br…
The one time of year when the ToBRV proportion goes down is late Summer, when it is partially displaced by Tomato Mosaic virus (which does infect US tomatoes).

This probably reflects people eating more local tomatoes when they are in season.
7/ Image
Although human pathogens were a tiny fraction of the total sequences, there was still plenty of sequences to figure out what the circulating human pathogens were.
8/ Image
There was only one respiratory virus that was present year-round. You guessed it, SARS-CoV-2.

It’s still here.
9/ Image
We were also monitoring SARS-CoV-2 in these samples the old-fashioned way (dPCR) and it was nice to see that the amount detected from sequencing (normalized or not normalized) correlated pretty well with the dPCR results.
10/ Image
We also detected all of the other human coronaviruses, and influenza viruses. They all were most prevalent January-March, as expected.
10/ Image
Other respiratory viruses circulated later. For instance, ParaInfluenza 3 circulated from April-June both years.

This was expected, but I still don’t understand it epidemiologically.
Why then?
11/ Image
Most of the Rhinoviruses also circulated in the Spring, but as I’ve noted before, the specific serotypes changed from year to year.
12/ Image
You can see a much more detailed readout of the rhinovirus (with more sites and dates) on our dashboard.
13/
dholab.github.io/public_viz/004…
We also saw a year-to-year turnover in Parechovirus serotypes (causes meningitis).

This exactly matched what our colleagues down the road in KC found in pediatric patients.


14/ pubmed.ncbi.nlm.nih.gov/40712199/Image
There were also a few Fall respiratory viruses we detected. Enterovirus D68 (expected), and on off-season surge of rhinovirus C42 (not expected, but nationwide).
15/ Image
There were other things we saw that we REALLY didn’t expect. For instance, influenza H5N1 B3.13 (which wasn't in MO) appeared in Spring of 2024.

We’re pretty sure this came from a dairy in town that imports their milk from Texas, where H5N1 was rampant at that time.
16/ Image
The data in this paper is from one sewershed Jan.2024-June 2025.
If you want to know about more sites and more recent data, visit our dashboard. It’s updated at least once a week.
17/
lungfish-science.github.io/wastewater-das…
I think this kind of surveillance is the way of the future, but we’re not there yet.

1. Price: too expensive.
2. Unknowns: most sequences are 'unknown'.
3. False positives: requires careful curation.

18/
Price, right now the sequencing alone is at least ~$500-1000 per sample.
However, it keeps going down. There was a noticeable decrease in price even during the duration of this study.
19/
Unknowns. A very large portion of the sequence from this data is ‘dark matter’: sequences from species (largely viruses) that have never been characterized.
(That’s what our next manuscript will be about.)
20/
False positives. Even many human viruses are not characterized.

Every week we find sequences whose closest match is polio, but it's never polio.

It’s always viruses related to polio that just aren’t in the database.

It's annoying, and time consuming checking them all.
21/
Eventually sequencing will be cheaper, databases will be more complete, and this kind of study will be routine (maybe 5-10 years).

Meanwhile, keep pooping, we take this shit seriously.

Thanks to our collaborators/funders:
Inkfish

@SecureBio

22/22naobservatory.org/casper/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Marc Johnson

Marc Johnson Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SolidEvidence

Jan 24
We found a new (I think) cryptic lineage this week.
I know I say this all the time, but this is really weird.
Warning, this thread is for nerds only.
1/
Here’s what we do. Every week we download all of the new sequences from SRA and run a bunch of screens to look for anachronistic or cryptic lineages.

This new one popped up in 3 different screens.
2/
A good way to spot anachronistic lineages is to look for sequences that have been deleted in contemporary lineages. The virus can only undo a deletion through recombination. If we find seqs that lack the deletions, they have to be old (or contaminated with something old).
3/
Read 16 tweets
Nov 23, 2025
What should we expect this flu season?

Here’s a forecast from a wastewater perspective (because sh*t don’t lie)

1/
Background. The 4 main kinds of influenza circulating among humans (in order of severity) are:
FluA H3N2
FluA H1N1
FluB
FluC (many don’t know this one)

2/
Last season, there was a pretty even split between H1N1 and H3N2, with a little bit of FluB late in the season. At least according to CDC patient data.
3/ Image
Read 13 tweets
Nov 21, 2025
This is wild.

Remember the NJ crytic lineage?

I posted 18 months ago that the Spike was too divergent to predict ACE2 binding, and asked if someone else could figure it out.

Some colleagues took me up on it.

Guess what they found?
1/
This preprint just came out. @wchnicholas and team reconstructed and tested the NJ Spike and found that it has the tightest ACE2 binding of any SC2 Spike ever measured.
2/
medrxiv.org/content/10.110…
We first found the NJ variant in 2023 because this sewershed from NJ with 1.5 million people because it regularly had a sequence that was a reversion to the bat sarbeco sequence, which is common in cryptics.
3/

Read 9 tweets
Oct 24, 2025
Help me out, I’ve got another wastewater virus mystery.

This one really blows my mind.
1/
Starting in the late 2023, + @securebio have been doing ultra-deep metagenomic sequencing of the virome from Columbia, MO wastewater.

We’ve collected and sequenced sample for over 90 consecutive weeks.
2/Lung.fish
We sequence about a billion reads per sample. That’s generated about 16TB of data from this site so far.

To put this in perspective for people my age, it would take a stack of 3.5 in floppy disks 200 miles high to store this data.
3/
Read 12 tweets
Oct 17, 2025
It looks like Coeur d’Alene, ID cryptic is gone for now, but it has still managed to answer a lot of lingering questions for me about SARS-CoV-2 evolution, and what to expect next.

Here's a whole genome summary and interpretation.
1/ Image
For a long time cryptic lineages were all from pre-Omicron lineages.

I started wondering:

Will there be Omicron cryptics?

If so, will they have the same evolutionary trajectories as the pre-Omicron cryptics?

ID shows that the answer to both questions is yes.
2/
We don’t do a lot of whole genome sequencing, so I sent 3 samples to @dho lab, who got fantastic sequences for all 3.
These samples were virtually 100% cryptic, so we have nearly complete coverage of the genome for a change.
3/ Image
Read 12 tweets
Sep 10, 2025
This really pisses me off.

I obviously knew there was some manipulation of post metrics on social media, but I really didn’t realize just how hard this platform slams the breaks on posts it doesn’t like.

Here’s my experiment.
1/
This weekend I posted 3 threads.

1. on a cryptic lineage
2. on H5N1
3. on seasonal respiratory viruses

Each time I posted the threads on X and bsky at the same time.
2/
The three threads each got roughly the same attention on bsky.
However, on X the first 2 each had hundreds of RTs and over 1k likes.
The 3rd was practically invisible. It had only 5 RTs and 28 likes after 2 days. Over 40-times fewer views.

3/
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(