Today's #longreadclub episode will stream live with @koadman at 11am to ask him about @longastech: who are promising a method to generate long reads using short read platforms, resulting in accurate and single contig de novo assemblies!
First watch:
(1/n)
@koadman @longastech Aaron is the CSO and co-founder of Longas Technologies, and an academic bioinformatician at UTS Sydney, who has been responsible for some key bioinformatics software and algorithms, notably Mauve/progressiveMauve and Phylosift. #longreadclub
@koadman @longastech Aaron starts by introducing the two variables that most influence assembly: read length and coverage, and reiterates @torstenseemann's two laws of assembly:
#longreadclub
@koadman @longastech @torstenseemann Long read platforms (ONT, PacBio, linked reads like 10X) should solve these problems but Aaron notes that the databases are not yet filling up with complete microbial genomes relative to draft genomes - why?
#longreadclub
@koadman @longastech @torstenseemann Introducing Morphoseq: a way of getting long "virtual" reads from short read platforms like Illumina. The basic principle is to mutagenise sample to actually remove repeats: each read gets a unique signature of mutations !
#longreadclub
@koadman @longastech @torstenseemann Process: tagment sample to make long fragments (10kb), perform mutagenesis by incorporating nucleotide analogues like pPTP by PCR, add sample barcodes, then replace pPTP with (random) natural nucleotides, size select and then perform enrichment PCR.
@koadman @longastech @torstenseemann Bioinformatics process requires unmutated data also: make short-read assembly graph, then map the mutated reads onto the assembly graph. Follow the breadcrumbs of mutated reads to give you a unique path through repeats! Nifty.
@koadman @longastech @torstenseemann Evaluated using 60 microbial genomes available from BEI with fairly low yield and quality, as well as 3 genomes with varying GC contents that they also generated nanopore data for:
#longreadclub
@koadman @longastech @torstenseemann What does the data look like?
Lengths look good, and the induced mutation rate is around 6-8% despite GC content:
#longreadclub
@koadman @longastech @torstenseemann When you inspect the raw reads you can see the mutated sequences (plus sequencing error). Then these reads can be reconstructed into long mutated reads:
#longreadclub
@koadman @longastech @torstenseemann So can these long reads help with assembly? Using the long reads and short reads in @rrwick's Unicycler pipeline: able to reconstruct a low GC Arcobacter organism into a single contig! #longreadclub
@koadman @longastech @torstenseemann @rrwick When applied to the BEI data many of the genomes are coming out as circular contigs or at least improved: but room for gains in the software. #longreadclub
@koadman @longastech @torstenseemann @rrwick Accuracy matters: 90% accuracy read != 99% accuracy read for de novo assembly. Shows chart from Jain et al. 2018 showing effect of accuracy on NG50 (nature.com/articles/nbt.4…), recently reused in PacBio HiFi read paper out today; nature.com/articles/s4158… #longreadclub
@koadman @longastech @torstenseemann @rrwick Unlike linked reads, MorphoSeq can resolve through complex local repeats (VNTRs, microsatellites) etc. Also is better at resolving gene calls (using @BioMickWatson's broken gene predictor) than nanopore-only assemblies:
#longreadclub
@koadman @longastech @torstenseemann @rrwick @BioMickWatson Looking at cost: if you try and and generate 135x short read data and 15x long you can theoretically assemble a complete E. coli genome for the price of a extra value meal at McDonalds on NovaSeq S4 (at least with respect to sequencing cost)
#longreadclub
@koadman @longastech @torstenseemann @rrwick @BioMickWatson Summing up here:
Data available, looking forward to the bioRxiv preprint!
#longreadclub
@koadman @longastech @torstenseemann @rrwick @BioMickWatson Thanks @koadman! We'll be sitting down to chat with him in just under 2 hours (11am UK time). If you have questions you'd like to ask him, pop them on the end of this thread, or on the YouTube video, or in the comments of the live Q&A stream when we post the link later.
@koadman @longastech @torstenseemann @rrwick @BioMickWatson Go check out the archived Q&A with @koadman at:
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.