, 10 tweets, 4 min read
My Authors
Read all threads
So, this preprint came out today ..
bit.ly/2K4jKzK
Let me (not so) briefly explain a few things ..
first, let me say it's been a super fun project and a huge tour-de-force from our very talented @CrowellHL .. and lucky for us, we benefit from the ever-present wisdom of @CSoneson, some great insights from Pierre-Luc Germain along the way and great collaborators at Roche.
So .. people talk a lot about differential expression in scRNA-seq data, but from what I can tell, most of it is about how to find marker genes. That's indeed interesting, but not what we are on about here. Here, we're interested in *sample-level* comparisons of scRNA-seq ..
meaning, we have a proper replicated experiment, multi-sample, multi-condition and we are profiling multiple cell subpopulations (e.g., tissue, blood, etc.). Here, we're interested in following a subpopulation (e.g., T cells) across samples, say to compare treated vs control.
We call this *differential state* analysis. The cells are already organized into "types" (ok, this requires a computational analysis) and we look for differences in expression in the cells of treated samples versus cells of control samples; we do this one subpopulation at a time.
Turns out there are many ways to do this: mixed models, "pseudobulks", comparing distributions. So, we wanted to survey how to best do this.
First hurdle was simulating data in a way that we maintain properties of scRNA-seq but also considers sample-to-sample variability.
Not easy, but after some iterations (we test it against real datasets using @CSoneson's countsimQC package + some pseudobulk-level dispersion-mean plots to check sample-sample variability), I think we have a very flexible simulation ..
After that, we do the bakeoff. Many more details in the paper, but overall pseudobulks do quite well and are fast. Mixed models work well, but bit of an extra cost to compute. Overall, I think there is scope for new methods too, especially if there is concern about aggregating.
We also did a nice analysis of an snRNA-seq dataset from mouse cortex tissue .. replicates of mice +/- peripheral LPS treatment .. and also, full code for the pipeline: integration, clustering, annotation, differential state analysis, subpopulation-level geneset analysis and more
Software: bit.ly/2YoCnax
Code: bit.ly/2Mj9mXB

Feedback welcome!

I will speak about this at #JSM2019 next week and we have a workshop on the topic @BC2Conference ..
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Mark Robinson

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!