If you missed the first tweetorial in the series, click here to understand what the red-and-blue heatmaps here mean, and how we use them to map the functions of genes at a genome-wide level:
To recap: if you knock out each gene in the genome, plot all their pairwise similarities, and sort by genomic position, a curious pattern emerges in which #CRISPR knockouts look more similar to KOs on the same chromosome arm than to KOs on other arms.
In fact, the image shows TWO maps: the @RecursionPharma#RxRx3 map in HUVEC (above the diagonal) and the cpg0016 map in U2OS made by the JUMP-CP consortium led by @DrAnneCarpenter and @shantanuXsingh (below): the effect reproduces across labs, protocols, cell types, etc.
We call this “proximity bias,” as KO sims reflect genomic proximity, not just gene function.
Cool tidbit: this bias even reflects non-canonical genome struc. There’s a known fusion in U2OS between chr5q and chr19q & we see that patch of proximity bias in U2OS but not in HUVEC.
We also noticed that the strength of proximity bias fell off going from centromere-to-telomere, suggesting a model in which this bias was caused by chromosomal truncations: lose more genes in common, get stronger similarity.
Searching an internal database of 25k RNA-seq samples, jackpot: strong evidence for specific losses from cut-site to telomere in a number of samples!
Bulk RNA-seq doesn’t tell us whether this is a weak effect in many cells, or a strong effect in a few cells. So we searched #CRISPR datasets in @sandercbio awesome scperturb.org collection, and sure enough: clear evidence of truncations in rare subpopulations of cells!
In tomorrow’s thread, I’ll continue to explain how proximity bias affects a broad range of #CRISPR functional genomics datasets and confounds the community’s efforts to decode #biology, by looking closely at the @CancerDepMap.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Friday got away from me, but it was a beautiful weekend.
Today, the last tweetorial: how to debias the proximity bias we found @RecursionPharma widely confounding #CRISPR screens, and what it means for the field.
Earlier tweetorials to understand the similarity maps we examined, what “proximity bias” means and its molec. origins in chrom. truncations, and how resources like @CancerDepMap are confounded by prox bias:
Finally, we share a way to de-prox-bias similarity-based maps. TL;DR: from each gene, subtract off the avg representation of unexpressed genes on the same arm. Simple and works! Above diag is pre-correction, below is post, in #RxRx3 & cpg0016.
Tweetorial 3: how the “proximity bias” we found @RecursionPharma confounding #CRISPR screens affects widely used screening resources like the @CancerDepMap (spoiler: the image below!), and where it comes from.
If you missed the first two tweetorials in the series, click here to understand the similarity maps we examine here, what “proximity bias” means, and its molecular origins in chromosomal truncations:
To recap: #CRISPR KOs are more phenotypically similar to KOs of unrelated genes on the same chromosome arm (1), and this bias arises from occasional truncations of chromosomes in which the chunk from the cut site to the telomere is lost (2,3,4).
Tweetorial time! We @RecursionPharma mapped consequences of #CRISPR screening of >17K human genes, found a systematic bias confounding all CRISPR screens, traced its molecular cause, and propose a debiasing algorithm.
“But Imran,” you say, “I’d rather read your thrilling 41-page manuscript than read tweet threads!”
In this first tweetorial, I’ll share some of the foundations of the similarity-based “maps” we build @RecursionPharma as background for what we found out about CRISPR by building a map over the whole genome.
A primer: the @RecursionPharma platform takes images of cells under different conditions (disease agent, disease+drug, control, etc.), and feeds the images through a custom deep network to derive a high-dimensional (128-1024D) "embedding".
Instead of measuring say, two parameters like viral titer and cell count, we measure 100s-1000s of parameters describing the morphology of cells in a plate. This information captures a lot of biology, as @i_draw_hexagons described in his tweetorial:
We developed a human cell model of SARS-CoV-2 infection, compared it to the field-standard monkey cell model, and screened ~1700 drugs.
Also: the entire cellular image dataset (~450GB of 5-channel microscopy) is available at rxrx.ai/rxrx19. 305,520 5-ch pics @ 1Mpx licensed CC-BY. Want some big image data for ML to help with the pandemic? Here it is. We've also released the DL image embeddings.
Now on to the paper. If you missed @zavaindar's explainer from Friday, it's a great one to start with. I'll provide my own insights into the work here.