Identification of m6A has been possible for some time using @nanopore direct RNA-Seq. Today we published m6Anet, achieving high accuracy single base, single molecule resolution nature.com/articles/s4159…
What is m6Anet, and why this makes a big difference🧵#biodata22@naturemethods
m6ANet (github.com/GoekeLab/m6anet) implements a supervised machine learning appraoch, i.e. it uses labeled data (direct RNA-Seq data at known m6A sites) for training a model that then makes predictions in the absence of any labels
However, not all reads at known m6A positions are modified which is known as a multiple instance learning (MIL) problem. m6ANet models m6A detection as a MIL problem, therby automatically handling the mismatch between data and labels.
This leads to very accurate m6a predictions on human data (which we used for training), but even in other species like Arabidopsis.
In fact the data suggests that the m6A predictions from direct RNA-Seq data are comparable to regular NGS based m6A profiling methods
But unlike these, m6Anet only requires a single run of direct RNA-Seq data, it achieves single base resolution, and it can make predictions for individual RNA molecules/reads ...
making m6A profiling possible even without access to specialised protocols. This is what we believe makes the difference: almost any team with access to a sequencing platform can now profile RNA modifications at high accuracy
This work was lead by the brilliant @christopherhendra who did his PhD work jointly in my team and with @alexthierymany. Many thanks also to @ploy_rukawa@yukkeiwan and @ShoGohLab who have shaped and improved this work in many ways!
We were very lucky to have had three fantastic reviewers that made this manuscript much better, many thanks to @mason_lab@1m1a2t and Franz Josef Müller. Peer review reports are all available online @naturemethods, have a look these are very informative nature.com/articles/s4159…
If you attend #biodata22 look out for our poster (192) and reach out to me!
The preprint from the Singapore Nanopore Expression Project (SG-NEx) is online! The SG-NEx data is a systematic resource for @nanopore long read RNA-Sequencing, including direct RNA, cDNA, and PCR cDNA with matched short read data and spike-in RNAs. 1/n biorxiv.org/content/10.110…
First of all, if you want to study the transcriptome, you want to use long reads (they are designed for this). Sometimes that is not possible (throughput, RNA input requirements, and no experience in data processing and analysis are the most obvious reasons).
If you are only interested in gene expression, it might actually not make a big difference (there is also microarrays by the way). In other scenarios, expect a difference, that is where long reads are very likely to help.