, 19 tweets, 25 min read
My Authors
Read all threads
A thread about a landmark reproducibility and usability achievement by @JaseGehring for his @NatureBiotech paper "Highly multiplexed single-cell RNA-seq by DNA oligonucleotide tagging of cellular proteins." The reproducibility repository is here: github.com/pachterlab/GPC…. 1/19
@JaseGehring @NatureBiotech For reference, the paper is nature.com/articles/s4158…. It’s the first technology work published by our lab (!) and although the focus is on the experiments and protocols, data analysis is a key part of the paper. 2/19
@JaseGehring @NatureBiotech I have some things to say about reproducibility, but before getting into details go reproduce *and use* our work for yourself. E.g. @JaseGehring has made analysis of the 96-plex experiment in the paper completely transparent: colab.research.google.com/github/pachter… Just click 'Run all'! 3/19
@JaseGehring @NatureBiotech People talk a lot about reproducibility…but what does it really mean and why is it important? Reproducibility is not just about verification; it’s about usability: sharing data and analysis in a way that makes it possible to build on published work: liorpachter.wordpress.com/2014/03/18/rep… 4/19
@JaseGehring @NatureBiotech The value of reproducibility and usability have been repeatedly demonstrated. There are numerous examples showing that usability directly accelerates science 5/19
@JaseGehring @NatureBiotech Obviously, at a minimum, reproducibility, let alone usability, requires sharing data and analysis code. However many authors are resistant to even this basic standard despite sincere efforts from journals to raise the reproducibility bar. nature.com/articles/nbt.2… 6/19
@JaseGehring @NatureBiotech For example, this recent paper nature.com/articles/s4158… does not make code (binary or source!) available; currently just getting access to an online server that runs the code requires registration. 7/19
@JaseGehring @NatureBiotech There is also a restrictive license. Users are prohibited from using the software “for diagnosis, treatment, cure, prevention, or mitigation of disease or other conditions in man or other animals”. That is, from applying it in practice. cibersortx.stanford.edu/register.php 8/19
@JaseGehring @NatureBiotech Yet source code and data sharing is so fundamental to accelerating science that our community standard should be that both are shared *even for preprints*. liorpachter.wordpress.com/2019/10/21/zer… 9/19
@JaseGehring @NatureBiotech While source code and data sharing is necessary, it is not sufficient. The complexity of current experiments requires details of metadata, compute environment, and analysis code to be transparent. There are some innovative solutions being developed: nature.com/articles/nbt.3… 10/19
@JaseGehring @NatureBiotech Anecdotally I've found some groups try harder than others; e.g., GSE118614 (ncbi.nlm.nih.gov/geo/query/acc.…) from Clark et al. 2019 has well organized metadata. But for the most part, it’s currently very difficult to use published data, even when analysis code is made available. 11/19
@JaseGehring @NatureBiotech One way to facilitate usability is to develop analyses workflows that work with free cloud compute such as @GoogleColab. @sinabooeshaghi has been pushing the limits of what can be done with its confines as part of our kallisto | bustools project: biorxiv.org/content/10.110…. 12/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi During the past few months, @sinabooeshaghi along with two outstanding undergraduates in the lab he's supervised (@lioscro and Lauren Liu), have produced a series of lean @GoogleColab workflows, including for complex analyses such as RNA velocity kallistobus.tools/tutorials 13/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro As optimistic as I've been about the potential of @GoogleColab, it is a leap to believe that it could be used to provide reproducibility for an entire paper. Well..a monumental effort by @JaseGehring (welcome to @twitter!) proves that it's possible! github.com/pachterlab/GPC… 14/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro @Twitter Putting this out there is scary. The ease of reproducibility facilitates someone finding something suboptimal or in error in our work. But that's what reproducibility and usability are all about, and better this way than "available upon request". 15/19 media.giphy.com/media/ycJqRIuu…
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro @Twitter Acknowledgments: Thanks to @yarbsalocin, @hjpimentel and @pmelsted for taking reproducibility and usability to a whole other level with kallisto and sleuth. Those were the first projects where we Snakemaked everything starting with downloading the data. 16/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro @Twitter @yarbsalocin @hjpimentel @pmelsted kallistobus.tools was key to getting the analysis working in @GoogleColab. This was possible thanks to extraordinary optimization work & development of bustools by @pmelsted & @sinabooeshaghi. The paper uses Cell Ranger since kallisto | bustools wasn't preprinted yet. 17/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro @Twitter @yarbsalocin @hjpimentel @pmelsted Specifically, @sinabooeshaghi's demonstration of RNA velocity in @GoogleColab (liorpachter.wordpress.com/2019/07/01/hig…) was a key technical achievement and "Aha!" moment that made this possible. @JaseGehring's work on the reproducibility & usability of the paper was a project unto itself. 18/19
@JaseGehring @NatureBiotech @GoogleColab @sinabooeshaghi @lioscro @Twitter @yarbsalocin @hjpimentel @pmelsted A final note: the notebooks at colab.research.google.com/github/pachter… take several hours to run, but we expect this kind of infrastructure will improve greatly in the next several years. This is just a proof-of-concept. 19/19
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Lior Pachter

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!