Q: Which is better for taxonomic classification of #metagenomics samples - Kraken2 or MetaPhlAn 3?
A: It really depends!
Read the very short story in this thread, or the full story in my preprint w/ @BetaScience and André Comeau: bit.ly/3EWkYJf
1/
Now, you may be thinking "but aren't there loads of studies that compare different metagenomic taxonomic classifiers already?" (& you would be right), but what they don't do is compare the impact of different parameters and reference databases on the classifications.
2/
What started about two years ago, as a quick test of which tool/parameters we should use to classify some samples, took on a whole life of its own and I'm really pleased it's finally out there. If you use Kraken or MetaPhlAn then I think some of this will be useful for you.
3/
We noticed that running MetaPhlAn on metagenome [MG] samples might only identify 10 species, or even none at all, while running Kraken on the same samples might identify up to tens of thousands of species.
4/
To investigate this systematically, we collected ~400 previously made simulated/mock MG samples with known compositions and classified them with both Kraken2 and MetaPhlAn 3. We made several different Kraken2 databases (all available on Dropbox)...
5/
And we tested out almost all of the tool-parameter-database [DB] combinations that we could (and took a really deep dive into Kraken2's confidence threshold), giving >60,000 taxonomic profiles for us to analyse.
6/
We show that if you run Kraken2 with the default parameters & DB then it really performs quite badly. But, by making changes to the parameters (the confidence threshold) & using a complete DB, it outperforms MetaPhlAn.
7/
The best tool-parameter-DB combination also depends on sample characteristics (we provide some guidance on how to choose this for your samples), but in general, we were able to achieve higher performance with Kraken than MetaPhlAn.
8/
Although we do want to highlight that MetaPhlAn is *really* quick (& easy) to install & run (& computational requirements are very low). It also performs pretty well out-of-the-box on these samples (i.e. no optimisation required).
9/
So there isn't really a one-size-fits-all "best" option (particularly when also considering available computational resources) but if you've just been running Kraken with the default DB & parameters then you can likely achieve more accurate classification than you have been.
10/
And finally, this is just a preprint at the moment so we'd welcome any feedback you may have on it 🙂
11/11
• • •
Missing some Tweet in this thread? You can try to
force a refresh