In response to criticism of the lack of any women on the recent @numpy_team paper, the authors have floated a narrative that this is the result of "societal constraints", and meager origins of the project. The truth does not abide.
Let's start with a bit of history. NumPy has its origins in code developed in the 1990s, with the first official version released by Travis Oliphant (@teoliphant) in 2006. Kudos to him for an important effort; NumPy has had a huge impact on scientific software. 2/
However the idea that all the developers were men because of "societal constraints", that there just weren’t any interested women, and that they’ve always wanted to work with women but just couldn’t because they were not funded... that's just baloney. 3/
In 2006, when he released v1.0 of NumPy, Travis Oliphant was an assistant professor of Electrical and Computer Engineering at @BYU. His research was on applied math / numerical analysis / scientific computing for engineering and biomedical problems. 4/
Taking a look at his publications (scholar.google.com/citations?hl=e…), we see that he has authored 54 papers. In total he has had dozens of coauthors on numerous articles in a variety of areas. Out of his 54 he co-authored with a woman exactly twice 😱. 5/
The SciPy paper has 1 woman out of 33 authors. I leave it as an exercise for the reader to find the other paper that has a single woman author.
Misogyny is not created in a vacuum. @teoliphant was a faculty member in a misogynist department. His department at @BYU had zero women out of 25 faculty when he worked there. 7/ web.archive.org/web/2005102700…
Even today, in 2020, with 24 tenure track or tenured faculty in the department, there is only 1 woman. There is one more woman who is the one and only adjunct professor in the department. ece.byu.edu/faculty 8/
One is reminded of this poster from the math department @byu a few years ago: 9/
Turning to funding, shortly after releasing the first version of NumPy, Oliphant became president of Enthought, which was specifically devoted to supporting and developing NumPy for Scientific Computing. Enthought was not NumPy, but NumPy work supported Enthought’s mission. 10/
There is no doubt that many programmers contributed their time, uncompensated, to NumPy. But NumPy has not been without funding. In 2013, DARPA dished out $3 million for Python “Big Data” development informationweek.com/software/infor… including funding for @teoliphant and Blaze. 11/
This work was done at a company, Continuum, that Oliphant founded for the purpose of developing NumPy, and where he employed key developers of NumPy. Continuum is now called Anaconda.anaconda.com 12/
Meanwhile Oliphant moved on, founding Quansight, where he again employs several key developers of @numpy_team. The leadership team at Quansight is 12 men and 1 woman. quansight.com/about-us 13/
Ralf Gommers (@ralfgommers), who blocked me for talking about the lack of women on the NumPy paper, works for @teoliphant at Quansight. He is the director of Quansight Labs. 14/
All of this is **before* NumPy receives @cziscience funding. And the funding goes to… Ralf Gommers at *Quansight*. Remember “societal constraints” and “volunteers”… give me a break. 15/ chanzuckerberg.com/eoss/proposals…
Is NumPy great software? Absolutely.
Do I use it? Yes (I also listen enjoy listening to Wagner).
Is the lack of women on the NumPy paper just an unfortunate result of societal constraints, a leaky pipeline, and lack of resources of some guys in a garage? I don't think so. 16/
A constructive path forward can taking many forms, but it has to begin with an honest accounting of the past. 17/
The OSS community has done many great things, and it is an amalgamation of many teams and projects so one cannot generalize about it. But the idea that all of it is ethical and equitable is nonsense. 18/
• • •
Missing some Tweet in this thread? You can try to
force a refresh
It's been great to see the positive response of @satijalab & @fabian_theis to our preprint on Seurat & Scanpy, and their commitment to work to improve transparency of their tools. One immediate benefit will be better practice of PCA in genomics. 1/🧵biorxiv.org/content/10.110…
PCA became a mainstay in genomics after the papers of @soumya_boston, Josh Stuart & @Rbaltman () and @OrlyAlter () ca. 2000 demonstrated its power for studying gene expression. 2/worldscientific.com/doi/abs/10.114… pnas.org/doi/10.1073/pn…
Back then, having linear algebra on one's side was essential. A rich lab at that time might have something like a Sun Blade workstation clocking ~500MhZ w/ 2Gb RAM. So having fast SVD algorithms made PCA practical, when other methods based on more sophisticated models weren't. 3/
The difference in @10xGenomics' Cell Ranger's default between version 6 and 7 is discussed in this thread, but it's such a big deal that it's worth its own thread.
tl;dr: in v7 Cell Ranger changed how it produces the gene count matrix leading to a huge difference in results. 1/
The change was described in release notes on May 17, 2022, which via two clicks lead to a technical note with more detail: 2/ cdn.10xgenomics.com/image/upload/v…
To understand this technical note it is helpful to be familiar with the three types of reads that are produced in single-cell RNA-seq: spliced (M as a proxy for mature mRNAs), unspliced (N as a proxy for nascent RNAs), and ambiguous between both (labeled A). 3/
The choice of whether to use Seurat or Scanpy for single-cell RNA-seq analysis typically comes down to a preference of R vs. Python. But do they produce the same results? In w/ @Josephmrich et al. we take a close look. The results are 👀 1/🧵 biorxiv.org/content/10.110…
We looked at a standard processing / analysis summarized in the figure below. The sources of variability we explored are in red. The plots and metrics we assessed are in blue. We examined the standard benchmark 10x PBMC datasets, but results can be obtained for other data. 2/
Before getting into results it's important to note that Seurat has never been published, and many of the details of Scanpy are missing in its original paper. @Josephmrich read the code & traced every function and every parameter. E.g., this is how Clustering / UMAPs are made: 3/
My blog passed 3 million views today from more than 1.8 million visitors. There have been a total of 119 posts in just over 10 years.
I'm one of those visitors. The blog is an idea repository and I go back sometimes for recall. Some highlights 1/🧵 liorpachter.wordpress.com
Just today I revisited the PCA post to recall some of the properties of the transform. A student, Nick Markarian, taught me the Borel-Kolmogorov paradox today (topic for a future post) and the post was helpful in thinking about some things. 2/ liorpachter.wordpress.com/2014/05/26/wha…
This year I had the privilege of enjoying in-person conferences again, and in April I met @dvir_a & Dan Gorbonos, from whom I learned a bunch of interesting science. Here we are having burgers at Hans im Glück in Bonn.
And now, a 🧵about genocide.. 1/
The topic came up at dinner. History presents a heavy burden for Jews in Bonn.. even 78 years after WWII. The "Hans in luck" restaurant we were dining at is just a few meters from where the local synagogue was burned down during "Kirstallnacht" in 1938. 2/
Although decades have passed since the holocaust, in Bonn the events felt closer in time. We were attending the Bonn Conference on Mathematical Life Sciences, which held a moment of silence for Holocaust Remembrance Day while we were there. 3/