Eric Davis @ericscottdavis1 & Wancen Mu @WancenM presenting two branches of functionality in the {nullranges} pkg: finding matched sets of genomic ranges based on covariates & bootstrapping blocks of genomic ranges. Both play well w/ {plyranges} for downstream analysis
Both methods are based statistical methods for refining null comparisons. E.g. we were inspired by {MatchIt}, {cobalt} and other matching packages, as well as {GSC} for the block bootstrap (method described in Bickel et al 2010)
Got an RNA-seq dataset with 50, 100, 200+ samples? Plug it into a differential expression tool and hope for the best? No! You need to consider QC, EDA, and modeling technical variation, or else risk generating spurious results. A thread on papers, methods, and best practices:
Short version: 1) look for outliers (QC) and technical variation with PCA plots 2) consider problems with confounding: model unwanted variation with methods like RUV / SVA / PEER 3) include technical factors in linear model, iterate with respect to positive and negative controls
This is commonly agreed upon. All of the main workflows for Bioconductor DE tools stress quality control and examination of EDA plots such as PCA before any statistical testing, see e.g.
1. @kwame_forbes wrote DESeq2::integrateWithSingleCell() which helps user locate publicly available SC datasets followed by visualization with his own R package:
Kwame was then a @UNCPREP scholar, now a first year BCB student at UNC 🧬💻🎉
2. Some Bioc folks and a team at UNC worked on extending the tximeta + DESeq2 + plyranges workflow that @_StuartLee@lawremi and I started in the fluentGenomics paper:
New preprint from first author Scott Van Buren, we look at various aspects of quantification uncertainty for scRNA-seq counts: interval coverage, trajectory analysis, and DE testing. 1/7
Last year, in the alevin publication, @k3yavi et al showed that assignment of all the reads in scRNA-seq was critical for accurate estimation of abundance across categories of genes by uniqueness. 2/7
And in the Swish publication, @anqiz91 et al showed how bootstrap replicates from alevin could be incorporated into a SAMseq procedure for differential testing across groups of cells. 3/7