Tweet

Michael Love

Jul 28 • 10 tweets • 5 min read

@AmyDWillis

Amy Willis @AmyDWillis
about to present on model misspecification in microbiome studies at #bioc2022

An April '22 preprint from David Clausen and Dr. Willis:

arxiv.org/abs/2204.12733

@AmyDWillis

@AmyDWillis Great and simple motivation from looking at mock communities (over-detection) as to why to move away from:

E(W_ij) = c_j mu_ij (just size factor scaling)

instead use:

@AmyDWillis

@AmyDWillis Oops that should have been c_i, the scaling factor for sample i.

i for sample here, j for taxa here (wide count tables)

Now introducing taxon specific efficiencies e_j to account for over-detection:

Why develop new models specifically for microbiome:

@davidandacat

Just realizing that David Clausen = @davidandacat author of the relevant tweet:

https://twitter.com/davidandacat/status/1547756139936301056

@davidandacat

@davidandacat Also this one which sounds like it is about to be relevant with respect to the Poisson modeling assumption:

https://twitter.com/davidandacat/status/1545471461980135427

@davidandacat

@davidandacat Estimation involves "a sequence of unconstrained optimizations, permitting solutions progressively
closer to the boundary"

Then, to obtain limiting distributions, when the true parameter may be on the boundary of the parameter space, weighted bootstrap where w ~ Dirichlet

@davidandacat

@davidandacat Dr. Willis on her thoughts on interesting directions for future work in modeling microbiome data

All of the methods shown here are available at this github link:

github.com/statdivlab/tin…

Collaborators and contributors:

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @mikelove

Michael Love

@mikelove

Jul 27

@ericscottdavis1

Eric Davis @ericscottdavis1 & Wancen Mu @WancenM presenting two branches of functionality in the {nullranges} pkg: finding matched sets of genomic ranges based on covariates & bootstrapping blocks of genomic ranges. Both play well w/ {plyranges} for downstream analysis

#bioc2022

Both methods are based statistical methods for refining null comparisons. E.g. we were inspired by {MatchIt}, {cobalt} and other matching packages, as well as {GSC} for the block bootstrap (method described in Bickel et al 2010)

@cziscience

Development and conference attendance for Eric and Wancen was supported by an EOSS award from @cziscience. Contributions from @dphansti, @mikhaildozmorov, @_StuartLee, @timtriche and other members of #nullranges slack channel

Previously described here:

https://twitter.com/mikelove/status/1495390667123695618

Read 6 tweets

Michael Love

@mikelove

Apr 11

Got an RNA-seq dataset with 50, 100, 200+ samples? Plug it into a differential expression tool and hope for the best? No! You need to consider QC, EDA, and modeling technical variation, or else risk generating spurious results. A thread on papers, methods, and best practices:

Short version:
1) look for outliers (QC) and technical variation with PCA plots
2) consider problems with confounding: model unwanted variation with methods like RUV / SVA / PEER
3) include technical factors in linear model, iterate with respect to positive and negative controls

This is commonly agreed upon. All of the main workflows for Bioconductor DE tools stress quality control and examination of EDA plots such as PCA before any statistical testing, see e.g.

f1000research.com/articles/4-1070
f1000research.com/articles/5-1408
f1000research.com/articles/5-1438

Read 17 tweets

Michael Love

@mikelove

Feb 20

@cziscience

This #EOSS funding from @cziscience for #DESeq2 and #tximeta wrapped up at the end of 2021.

Reporting in this 🧵 on what we developed:

https://twitter.com/mikelove/status/1330840564091252737

@kwame_forbes

1. @kwame_forbes wrote DESeq2::integrateWithSingleCell() which helps user locate publicly available SC datasets followed by visualization with his own R package:

kwameforbes.github.io/vizWithSCE/

Kwame was then a @UNCPREP scholar, now a first year BCB student at UNC 🧬💻🎉

@_StuartLee

2. Some Bioc folks and a team at UNC worked on extending the tximeta + DESeq2 + plyranges workflow that @_StuartLee @lawremi and I started in the fluentGenomics paper:

sa-lee.github.io/fluentGenomics/

Read 11 tweets

Michael Love

@mikelove

Jul 7, 2020

New preprint from first author Scott Van Buren, we look at various aspects of quantification uncertainty for scRNA-seq counts: interval coverage, trajectory analysis, and DE testing. 1/7

biorxiv.org/content/10.110…

@k3yavi

Last year, in the alevin publication, @k3yavi et al showed that assignment of all the reads in scRNA-seq was critical for accurate estimation of abundance across categories of genes by uniqueness. 2/7

genomebiology.biomedcentral.com/articles/10.11…

@anqiz91

And in the Swish publication, @anqiz91 et al showed how bootstrap replicates from alevin could be incorporated into a SAMseq procedure for differential testing across groups of cells. 3/7

academic.oup.com/nar/article/47…

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Michael Love

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @mikelove

Michael Love

Michael Love

Michael Love

Michael Love

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?