12,399 views

Rob Patro

@nomad421

, 10 tweets, 4 min read

My Authors

RNA-seq data is often analyzed at the level of genes. This can provide a robust signal, but can also miss out on biologically important information like differences in isoform composition or dominant isoform usage. 1/n

On the other hand, tremendous progress has been made in transcript-level quantification, but certain inherent ambiguity can remain in the abundance estimates. This results from patterns of multi-mapping where no inference procedure can accurately resolve the origin of reads. 2/n

Yet, the total transcriptional output of group of transcripts sharing these complex multi-mapping patterns will have greatly-reduced inferential uncertainty, thus allowing more robust and confident downstream analysis. 3/n

The idea of grouping together inferentially indistinguishable transcripts, and propagating the remaining uncertainty to downstream analysis was first suggested by Turro et al. ncbi.nlm.nih.gov/pubmed/24281695. 4/n

We build on these key ideas while introducing a fundamentally new and more efficient algorithm. We introduce a new data-driven approach for grouping together transcripts in an experiment based on their inferential uncertainty. 5/n

@rustlang

@rustlang

This approach is implemented in our tool (written in @rustlang), terminus (github.com/COMBINE-lab/te…). Terminus implements a graph-based algorithm to find transcriptional groups that is based on greedily selecting transcripts groups that reduce overall inferential uncertainty. 6/n

There is a cool connection with the (classic) algorithm of Garland and Heckbert for surface simplification (using quadric error metrics) from computer graphics — always a fun when you get to look at the Stanford bunny :). 5/n

Terminus groups together transcripts in a data-driven manner, allowing transcript-level analysis where it can be confidently supported, and deriving transcriptional groups where the inferential uncertainty is too high to support a transcript-level result. 7/n

Sometimes, even gene-level analysis can have high ambiguity for certain groups of genes (from highly-similar gene families). Terminus takes care of this in one simple and consistent framework; the transcriptional groups are purely data-driven. 8/n

@hrksrkr

@hrksrkr

This work was led by @hrksrkr, with contributions from @k3yavi, @hcorrada and @mikelove. You can learn more about terminus, how it works, and what it enables in our new pre-print on @biorxivpreprint. Feedback is welcome! biorxiv.org/content/10.110…. 9/9

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

Embed code for your website

Did Thread Reader help you today?