The 17 #BICCN @nature papers on the primary motor cortex in mouse (+some human & marmoset) that were published yesterday are a major step forward in terms of open science for an @NIH consortium. For reference, links to the open access papers are here: nature.com/collections/ci… 1/🧵
First, the #BICCN required preprints of all the papers to be posted on @biorxivpreprint, and as a result the papers were already online 1-1.5 years ago. Of course the final versions now published have been revised in response to peer review. 2/
Speaking of peer review, almost all the papers were published along with the reviews. In combination with the preprints, this provides an unprecedented view of how consortium work is reviewed and how authors respond. Real data for this perennial debate: 3/
Some of the reviews were superficial. For example referee #1 of the “flagship” paper (nature.com/articles/s4158…) wrote one paragraph summarizing the work + 3 minor comments (for a paper whose goal was to synthesize results from complex data published in 11 other papers!). 4/
Some reviews were brutally honest. Referee #1 of nature.com/articles/s4158… wrote "what we have..is a very well collected catalogue utterly devoid of either a conceptual framework or even an idea... the experience is like reading a phone book" . They signed the review (@blamlab).5/
Some reviews were serious(ly helpful). E.g., in our paper (one of the 17, namely @sinabooeshaghi et al., nature.com/articles/s4158…), a referee was concerned about batch effects & artifacts, leading us into a deep dive that revealed batch effects in the consortium #scRNAseq data. 6/
This helped us clean up our analysis in an important way. 7/
This is not the first time reviews of papers are published (@eLife has been doing this for a while), but having the referee reports (+ responses & non-responses) exposed for an entire consortium-worth of papers is a dataset ripe for study (& beyond the scope of this thread). 8/
Another aspect of open science is freely available data. In that regard, the #BICCN consortium has been exemplary. All the data generated is freely available, for example the #scRNAseq data used in @booeshaghi et al. is here (and has been for years): data.nemoarchive.org/biccn/grant/u1… 10/
However, as pointed out in a recent paper by @autobencoder, @michaelhoffman, @markowetzlab, @suinleelab, @GreeneScientist, and @stephaniehicks, data is not enough. Models and code are also essential and therefore a key part of open science. 11/ nature.com/articles/s4159…
Unfortunately, despite the fact that computational methods (including machine learning tools) are an essential piece of the #BICCN, many of the consortium papers fail to even medal by the standards of @autobencoder et al. . 12/
Many papers released no code to reproduce results or figures from their papers, and omitted key analysis details. This is not specific to #BICCN; it reflects widespread belief in the genomics community that data trumps methods, and rejection of the idea that #methodsmatter. 13/
However, most of the data generated for the 17 @nature papers published by the #BICCN was generated quickly and its the analysis that has taken several years. The difficulty of analysis can be seen in the papers that did release code (some achieved bronze 🥉). 14/
See., e.g. github.com/AllenInstitute… from Bakken et al. (nature.com/articles/s4158…) that shows just how challenging it is to perform analysis of the #BICCN data (and how useful it is to have the code). 15/
The #BICCN datasets were so large that it was challenging to enable reproducibility. In our paper (nature.com/articles/s4158…) @booeshaghi struggled to achieve "one-click" reproducibility with @GoogleColab that we strive for. I'd say we achieved bronze trending towards silver.. 16/
In summary, the steps taken towards open science by the #BICCN represent real progress. Having now participated in consortia from the mouse genome (2002) to the mouse brain (2021), I can say the progress is astounding. But we're still not at platinum.17/17

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lior Pachter

Lior Pachter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lpachter

29 Sep
In 2008, as a new professor of molecular and cell biology @UCBerkeley I presented at a seminar series intended to introduce 1st year students to research in the department. Two profs. presented each time, with food beforehand. I was paired with Thai food and Peter Duesberg. 2/
I knew of Peter Duesberg and his HIV/AIDS denialism, but I hadn't realized that he worked @UCBerkeley. We were now colleagues in the same department. 😱 3/
Read 14 tweets
24 Sep
I like the reproducibility standards for machine learning in the life sciences by @autobencoder, @michaelhoffman, @markowetzlab, @suinleelab, @GreeneScientist & @stephaniehicks but I propose an additional platinum standard for one click reproducibility.1/
By "one click", I mean that the entire analysis be reproducible in a (free) interactive online session of @colab (or other similar service). All steps of the analysis, from downloading data to generating figures are then not only automated but accessible for users. 2/
For an example of what this entails and facilitates, see: pachterlab.github.io/CWGFLHGCCHAP_2… 3/
Read 7 tweets
22 Sep
In response to questions & comments by @hippopedoid, @adamgayoso, @akshaykagrawal et al. on "The Specious Art of Single-Cell Genomics", Tara Chari & I have posted an update with some new results. Tl;dr: definitely time to stop making t-SNE & UMAP plots.🧵biorxiv.org/content/10.110…
In a previous thread I talked about the (von Neumann) elephant in the dimension reduction room: t-SNE & UMAP don't preserve local or global structure, they distort distances, and they are arbitrary. Almost everybody knows this but they are used anyway...
There were some interesting technical questions about our work. One question was the extent to which PCA pre-conditioning affects results. We examined this (Supp. Fig. 3). Tl;dr: it's time to stop making t-SNE & UMAP plots (with or without PCA pre-conditioning).
Read 20 tweets
27 Aug
It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.110… Image
On t-SNE & UMAP preserving structure: 1) we show massive distortion by examining what happens to equidistant cells and cell types. 2) neighbors aren't preserved. 3) Biologically meaningful metrics are distorted. E.g., see below: Image
These distortions are inevitable. Cells or cell types that are equidistant in high dimension must exhibit increasing distortion as they increase in number. Actually, UMAP and t-SNE distortions are even worse (much worse!) than the lower bounds from theory. ImageImage
Read 25 tweets
23 May
While it’s fun to banter about what constitutes a good lab, the part of this that is uncomfortable to discuss is that leaving a bad lab is in many cases near impossible. Few universities offer much support and PIs can and do retaliate, in some cases ending careers.
My first committee meeting of a biology student @UCBerkeley, when I was still a junior prof., resulted in a student breaking down in tears as he told us of abuse his advisor was inflicting on him. We brought this up with the advisor and department.
What happened? A few years later the professor was promoted to chair of the department.
Read 13 tweets
13 May
If you're working on spatial transcriptomics, I think you'll find @LambdaMoses' "Museum of Spatial Transcriptomics", which analyzes the field via its metadata, to be an incredibly useful resource. biorxiv.org/content/10.110… 1/11
The museum is organized as a main paper that provides an overview of a book (i.e. the Supplementary Material) which is based on a database of papers in the field compiled by @LambdaMoses. First the database... docs.google.com/spreadsheets/d…

It contains several hundred papers. 2/11
To undertake a comprehensive study of the field, @LambdaMoses read all these papers carefully, starting with "prequel" literature to establish historical context. The database has detailed metadata including a summary of each paper. This timeline is just of the prequel. 3/11
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(