A colleague asked me for some advice about 16S rDNA sequencing... So here is what I've learned during my PhD even when I worked more with Metagenomics than with amplicon sequencing. Constructive additions are welcome.
Traditionally, 16S amplicon sequences were clustered at 97% to create operational taxonomic units (OTUs). A unit that corresponds more or less to species, so we thought. academic.oup.com/bioinformatics…
Now, everybody uses @bejcal's dada2 or equivalent tools to get ribosomal sequence variants (essentially OTUs at 100%) while controlling for sequencing error. Keep in mind this is not the same as full-length 16S clustered at 100%.
Single nucleotide resolved RSV may not always be the what you want. e.g. If you want to compare datasets from different studies (and different variable regions). In this case, MAPseq is a cool solution. academic.oup.com/bioinformatics…
Taxonomy obviously is always somewhat complicated. @ace_gtdb does a very good job of clarifying the taxonomy of prokaryotes, e.g. clarify the Clostridiales which always pop up in 16S sequencing. And there is a 16S database for it. benjjneb.github.io/dada2/training…
The publication from @gbgloor was really game-changing for how I analyze microbiome data. frontiersin.org/articles/10.33…
If we understand microbiome data as compositions we can use powerful #coda methods for the statistical analysis of microbiome data.
Besides Aldex2 there is also @Alex_Washburne 's phylofactor a cool #coda tool that can find significant changed OTUs and aggregates them at the most meaningful level.