Russ Corbett Profile picture
May 20 9 tweets 6 min read
10,048,466! That’s a lot of #SARSCoV2 genomes in the single largest phylogeny ever that we update and optimize every single day! Here, I’ll explain how we are doing pandemic-scale phylogenomics.
We start by aggregating all of the new SARS-CoV-2 genomes from @GISAID, @NCBI, and @CovidGenomicsUK. After QC, we add each genome to the ever-growing phylogeny using @yatishturakhia’s amazing tool, UShER: nature.com/articles/s4158…
UShER places samples one by one and sometimes it infers suboptimal trees. Thanks to nifty engineering, our tool, matOptimize uses SPR moves to optimize the entire 10M sample tree every day! biorxiv.org/content/10.110…
Amazingly, Bryan Thornlow and @alexkramer_ showed that the trees we infer using parsimony have slightly higher likelihoods than those inferred by many maximum-likelihood programs and ours takes only a tiny fraction of the time. biorxiv.org/content/10.110…
I don’t think that parsimony is inherently better than likelihood. We think the difference is that matOptimize is able to search much more tree space in the same amount of time and that the super dense sampling of SARS-CoV-2 genomes is very well suited to parsimony.
Also check out @Nicola_De_Maio’s amazing “MAximum Parsimonious Likelihood” (MAPLE) method as a way to get the best of both worlds in the near future! biorxiv.org/content/10.110…
Thanks to @AngieSHinrichs, UShER is now a crucial part of genomic epi. It’s the primary basis for new @PangoNetwork lineage designations and assignments, and users worldwide upload their samples to visualize relationships with the other 10M genomes. genome.ucsc.edu/cgi-bin/hgPhyl…
We have had a ton of people contribute! Some of the main folks (in no particular order and certainly non-exhaustive): @AngieSHinrichs, @yatishturakhia, Jakob McBroome, Bryan Thornlow, @alexkramer_, David Haussler, @RobLanfear, @EBIgoldman, @Nicola_De_Maio, and many more.
And of course, sampling all these genomes has been a global effort! Sequence producers all over the world rush to share their data, genomic databases aggregate and distribute these data, and a kajillion open source bioinformatic tools analyze them.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Russ Corbett

Russ Corbett Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(