Tweetorial time! We @RecursionPharma mapped consequences of #CRISPR screening of >17K human genes, found a systematic bias confounding all CRISPR screens, traced its molecular cause, and propose a debiasing algorithm.
“But Imran,” you say, “I’d rather read your thrilling 41-page manuscript than read tweet threads!”
In this first tweetorial, I’ll share some of the foundations of the similarity-based “maps” we build @RecursionPharma as background for what we found out about CRISPR by building a map over the whole genome.
We use CRISPR to knock out individual genes and measure the consequences by “phenomics”: imaging analysis of cellular morphology and intracellular organization.
This “maps” biology: knockouts of related genes produce similar phenotypic consequences!
Here you see a similarity map of ~50 KOs of genes in conserved pathways, showing that related genes cluster with each other. This works for a lot of therapeutically interesting pathways, and we can do it not just for KOs but also for chemical treatments for drug discovery.
To use maps for drug discovery, you may start with a single gene, and then ask the map which other genes (or compounds) look similar to identify interesting starting points or targets. You would like similarities to be biologically meaningful, and that’s what we see above.
But if we show all the gene knockouts ordered by genomic position, a curious pattern emerges: CRISPR knockouts look more similar to KOs on the same chrom. arm than to KOs on other arms –producing a striking image of a genome-wide CRISPR map in which genome structure is obvious!
Hmm, you say. That’s weird.
Yes, it is! And tomorrow I’ll dig into what it means.
If you missed the first tweetorial in the series, click here to understand what the red-and-blue heatmaps here mean, and how we use them to map the functions of genes at a genome-wide level:
To recap: if you knock out each gene in the genome, plot all their pairwise similarities, and sort by genomic position, a curious pattern emerges in which #CRISPR knockouts look more similar to KOs on the same chromosome arm than to KOs on other arms.
A primer: the @RecursionPharma platform takes images of cells under different conditions (disease agent, disease+drug, control, etc.), and feeds the images through a custom deep network to derive a high-dimensional (128-1024D) "embedding".
Instead of measuring say, two parameters like viral titer and cell count, we measure 100s-1000s of parameters describing the morphology of cells in a plate. This information captures a lot of biology, as @i_draw_hexagons described in his tweetorial:
We developed a human cell model of SARS-CoV-2 infection, compared it to the field-standard monkey cell model, and screened ~1700 drugs.
Also: the entire cellular image dataset (~450GB of 5-channel microscopy) is available at rxrx.ai/rxrx19. 305,520 5-ch pics @ 1Mpx licensed CC-BY. Want some big image data for ML to help with the pandemic? Here it is. We've also released the DL image embeddings.
Now on to the paper. If you missed @zavaindar's explainer from Friday, it's a great one to start with. I'll provide my own insights into the work here.