If you missed the first tweetorial in the series, click here to understand what the red-and-blue heatmaps here mean, and how we use them to map the functions of genes at a genome-wide level:
To recap: if you knock out each gene in the genome, plot all their pairwise similarities, and sort by genomic position, a curious pattern emerges in which #CRISPR knockouts look more similar to KOs on the same chromosome arm than to KOs on other arms.
In fact, the image shows TWO maps: the @RecursionPharma#RxRx3 map in HUVEC (above the diagonal) and the cpg0016 map in U2OS made by the JUMP-CP consortium led by @DrAnneCarpenter and @shantanuXsingh (below): the effect reproduces across labs, protocols, cell types, etc.
We call this “proximity bias,” as KO sims reflect genomic proximity, not just gene function.
Cool tidbit: this bias even reflects non-canonical genome struc. There’s a known fusion in U2OS between chr5q and chr19q & we see that patch of proximity bias in U2OS but not in HUVEC.
We also noticed that the strength of proximity bias fell off going from centromere-to-telomere, suggesting a model in which this bias was caused by chromosomal truncations: lose more genes in common, get stronger similarity.
Searching an internal database of 25k RNA-seq samples, jackpot: strong evidence for specific losses from cut-site to telomere in a number of samples!
Bulk RNA-seq doesn’t tell us whether this is a weak effect in many cells, or a strong effect in a few cells. So we searched #CRISPR datasets in @sandercbio awesome scperturb.org collection, and sure enough: clear evidence of truncations in rare subpopulations of cells!
In tomorrow’s thread, I’ll continue to explain how proximity bias affects a broad range of #CRISPR functional genomics datasets and confounds the community’s efforts to decode #biology, by looking closely at the @CancerDepMap.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Tweetorial time! We @RecursionPharma mapped consequences of #CRISPR screening of >17K human genes, found a systematic bias confounding all CRISPR screens, traced its molecular cause, and propose a debiasing algorithm.
“But Imran,” you say, “I’d rather read your thrilling 41-page manuscript than read tweet threads!”
In this first tweetorial, I’ll share some of the foundations of the similarity-based “maps” we build @RecursionPharma as background for what we found out about CRISPR by building a map over the whole genome.
A primer: the @RecursionPharma platform takes images of cells under different conditions (disease agent, disease+drug, control, etc.), and feeds the images through a custom deep network to derive a high-dimensional (128-1024D) "embedding".
Instead of measuring say, two parameters like viral titer and cell count, we measure 100s-1000s of parameters describing the morphology of cells in a plate. This information captures a lot of biology, as @i_draw_hexagons described in his tweetorial:
We developed a human cell model of SARS-CoV-2 infection, compared it to the field-standard monkey cell model, and screened ~1700 drugs.
Also: the entire cellular image dataset (~450GB of 5-channel microscopy) is available at rxrx.ai/rxrx19. 305,520 5-ch pics @ 1Mpx licensed CC-BY. Want some big image data for ML to help with the pandemic? Here it is. We've also released the DL image embeddings.
Now on to the paper. If you missed @zavaindar's explainer from Friday, it's a great one to start with. I'll provide my own insights into the work here.