but I propose an additional platinum standard for one click reproducibility.1/
By "one click", I mean that the entire analysis be reproducible in a (free) interactive online session of @colab (or other similar service). All steps of the analysis, from downloading data to generating figures are then not only automated but accessible for users. 2/
In some cases programs may be too resource intensive to run directly on "light cloud" such as @GoogleColab, but the output from those steps can then be loaded into @GoogleColab or equivalent making possible immediate exploration of results by users. 4/
The difference between "one command" and "one click" is substantial. While the former is a very high (& excellent) bar for reproducibility, it leaves the barrier of actually getting everything to run on suitable hardware. We've found that lowering that barrier is empowering. 5/
We started, as a lab, to learn how to move from gold standard to what I am calling platinum with nature.com/articles/s4158… by @JaseGehring et al. Getting this right has been challenging and we're still learning, but it's been worthwhile, we think, for others and ourselves. 6/
In response to questions & comments by @hippopedoid, @adamgayoso, @akshaykagrawal et al. on "The Specious Art of Single-Cell Genomics", Tara Chari & I have posted an update with some new results. Tl;dr: definitely time to stop making t-SNE & UMAP plots.🧵biorxiv.org/content/10.110…
In a previous thread I talked about the (von Neumann) elephant in the dimension reduction room: t-SNE & UMAP don't preserve local or global structure, they distort distances, and they are arbitrary. Almost everybody knows this but they are used anyway...
There were some interesting technical questions about our work. One question was the extent to which PCA pre-conditioning affects results. We examined this (Supp. Fig. 3). Tl;dr: it's time to stop making t-SNE & UMAP plots (with or without PCA pre-conditioning).
It's time to stop making t-SNE & UMAP plots. In a new preprint w/ Tara Chari we show that while they display some correlation with the underlying high-dimension data, they don't preserve local or global structure & are misleading. They're also arbitrary.🧵biorxiv.org/content/10.110…
On t-SNE & UMAP preserving structure: 1) we show massive distortion by examining what happens to equidistant cells and cell types. 2) neighbors aren't preserved. 3) Biologically meaningful metrics are distorted. E.g., see below:
These distortions are inevitable. Cells or cell types that are equidistant in high dimension must exhibit increasing distortion as they increase in number. Actually, UMAP and t-SNE distortions are even worse (much worse!) than the lower bounds from theory.
While it’s fun to banter about what constitutes a good lab, the part of this that is uncomfortable to discuss is that leaving a bad lab is in many cases near impossible. Few universities offer much support and PIs can and do retaliate, in some cases ending careers.
My first committee meeting of a biology student @UCBerkeley, when I was still a junior prof., resulted in a student breaking down in tears as he told us of abuse his advisor was inflicting on him. We brought this up with the advisor and department.
What happened? A few years later the professor was promoted to chair of the department.
If you're working on spatial transcriptomics, I think you'll find @LambdaMoses' "Museum of Spatial Transcriptomics", which analyzes the field via its metadata, to be an incredibly useful resource. biorxiv.org/content/10.110… 1/11
The museum is organized as a main paper that provides an overview of a book (i.e. the Supplementary Material) which is based on a database of papers in the field compiled by @LambdaMoses. First the database... docs.google.com/spreadsheets/d…
It contains several hundred papers. 2/11
To undertake a comprehensive study of the field, @LambdaMoses read all these papers carefully, starting with "prequel" literature to establish historical context. The database has detailed metadata including a summary of each paper. This timeline is just of the prequel. 3/11
Yesterday I posted a piece about @OrchidInc's polygenic embryo selection. I thought, based on a press release I read, that they were the first company to undertake polygenic embryo selection. 1/ liorpachter.wordpress.com/2021/04/12/the…
The press release started w/ "Orchid, the first preconception system to quantify how a couple's genetics impacts their future child's health, today announced a $4.5M seed round..". It went on to describe the company's polygenic embryo selection product. 2/ prnewswire.com/news-releases/…
I naïvely assumed that Orchid is the first company to embark on polygenic embryo selection, but TIL that is not the case. In fact, more than two years ago, an article in @TheEconomist discussed myome.
In September I wrote a blog post reciting several false #covid19 claims and predictions made by Levitt over the course of the pandemic. That is not an "ad hominem attack". I reported Levitt's claims (with references). liorpachter.wordpress.com/2020/09/21/the… 2/14
Levitt, for his part, has responded to criticism of his failed predictions with non-sequiturs about attacks on free speech.