Associate Professor of CS @ University of Maryland. Proud Rust advocate! I β₯ science & compiled, statically-typed programming languages! Views are my own.
Jan 6, 2023 β’ 20 tweets β’ 8 min read
Are you interested in performing splice-aware quantification of your #scrnaseq data, obtaining unspliced, spliced, and ambiguous UMI counts quickly & in <3GB of RAM? If so, check out the new manuscript by @DongzeHe, @CSoneson and me on #bioRxivbit.ly/3vJr0Ji. 1/π§΅
Understanding the origin of sequencing reads β the molecules from which they arise, the "gene" with which those molecules are associated, and the splicing status of those molecules β is a key task in single-cell RNA-seq quantification.
Jul 28, 2022 β’ 15 tweets β’ 3 min read
I've been writing some small tools in rust for an (exciting) upcoming project. A few of thoughts on this experience & what makes it so enjoyable compared to the (fast, compiled, statically-typed) alternatives! This is mostly about the tooling, let alone how great the lang isπ§΅
Getting a project started is *trivial*. All of the "boilerplate" is generated automatically by `cargo init`, I don't have to worry about how to set it up because there is "a way". 2/
Feb 1, 2022 β’ 24 tweets β’ 9 min read
@bielleogy k-mers provide a way to compare sequences by directly looking at the composition of the "words" that make them up. A common analogy is to natural language processing and comparing text documents β imagine comparing 2 documents by counting the frac of words they have in common 1/@bielleogy There are many ways to measure this, but some common ones are metrics like the "Jaccard Index", which just counts the number of words (k-mers) in common divided by the total number of distinct words. 0 means no common words, 1 means all words are shared. 2/
Dec 7, 2021 β’ 5 tweets β’ 2 min read
@gunesaynasinda In my experience, research is a near constant roller coaster like this. There are periods of huge productivity and right after they are over you often look over your shoulder wondering why the wave didn't continue unabated, and feel like your productivity has "slipped". 1/x@gunesaynasinda But in reality, there are constant ups and downs, and the long-run average is only visible in a time-frame that's much larger than the gap between conference deadlines. Likewise, we often tend to look only at our peers that are the most productive at the current moment. 2/x
Apr 8, 2020 β’ 10 tweets β’ 4 min read
RNA-seq data is often analyzed at the level of genes. This can provide a robust signal, but can also miss out on biologically important information like differences in isoform composition or dominant isoform usage. 1/n
On the other hand, tremendous progress has been made in transcript-level quantification, but certain inherent ambiguity can remain in the abundance estimates. This results from patterns of multi-mapping where no inference procedure can accurately resolve the origin of reads. 2/n