Single-cell genomics assays are rooted in a handful of of technologies. They require physical isolation of cellular material, molecular barcoding, and library generation. Molecular modes include RNA, DNA (ATAC), protein and more which makes preprocessing challenging. 1/🧵
One challenge of data (pre)processing (distinct from processing) is that multiple data types that must be processed in a manner that minimizes batch effects. The challenge is ensuring that reads generated from assays are consistently catalogued, error-corrected, and counted. 2/🧵
To address this challenge, we present cellatlas, a tool for uniform preprocessing that build on kallisto bustools (kb-python, ) and seqspec ()- a collaboration with @DelaneyKSull and @lpachter.
cellatlas is a command line tool that generates the appropriate commands to uniformly preprocess sc(n)RNAseq data, sc(n)ATACseq data, Multiome data, feature barcoding data, CRISPR (PerturbSeq) data, Spatial (Visium) data, and more. 4/🧵
cellatlas leverages the seqspec specification () to appropriately identify, extract, and correctly handle sequenced elements like barcodes and UMIs. It then selects the correct workflow (standard kallisto bustools, kITE, snATAK) to deploy. 5/🧵 biorxiv.org/content/10.110…
cellatlas is simple to use- supply: 1. sequencing reads 2. a correct seqspec specification 3. genome fasta 4. genome annotation 5. (optional) feature barcodes
and the correct workflow will be generated for you. No more worrying about providing FASTQs in the right order. 6/🧵
cellatlas enables within-assay comparisons. We compare modes of DOGMAseq data (RNA/ATAC/Surface Protein/sample tags) from the same cell generated by R. Duerr/W. Chen. cellatlas allows us to hypothesize about an experimental cause for efficiency tradeoffs in reads/UMIs. 7/🧵
cellatlas also enables b/w assay comparisons- the challenge of which is rooted in lack of uniform preprocessing. Ascribing differences in data quality to wetlab techniques is challenging when preprocessing tools inject unnecessary variability (due to e.g. diff algorithms). 8/🧵
cellatlas solves this challenge with uniform preprocessing. Using cellatlas with 10x Multiome data (PBMCs) and DOGMAseq Multiome data (PBMCs) we find that DOGMAseq appears to be more efficient than 10x Multiome (at the same sequencing depth!) 9/🧵
We anticipate that uniform preprocessing will be useful in the development of new single-cell genomics assays, for example by revealing cross-technology tradeoffs. We also believe that uniform preprocessing will improve reproducibility. 10/🧵
This work builds on the efforts of many people including @pmelsted, @yarbslocin, @DelaneyKSull, @LambdaMoses, @lioscro, Fan Gao, @hjpimentel, @kreldjarn, @JaseGehring, Lauren Liu, @XiChenUoM, and many others. 11/🧵
In a new preprint, @lioscro, @JaseGehring, @lpachter and I describe a method and software (kITE) for quantifying orthogonal barcodes from assays such as Perturb-seq, Clicktagging, TAPSeq, CiteSeq, Multiseq, and 10xFeature Barcoding: 1/ biorxiv.org/content/10.110…
Orthogonal barcoding has become a method of choice for multimodal single cell genomics. For example, multiplexing assays such as Clicktagging rely directly on click chemistry 2/
That assay motivated us to develop kITE. Specifically, we needed to demultiplex ClickTags (developed by @JaseGehring , who initially prototyped the kITE approach) and like others at the time, had to write custom code to do it. 3/
Using dimensional analysis I estimate that the energy contained in the awful #Beirut explosion was approximately 12 Terajoules = 3 kilotons of TNT. For reference the "Little Boy" dropped on #Hiroshima was ~13-18 kilotons of TNT. #orderofmagnitudephysics
To estimate this, I used the following relationship between the energy of the explosion (E), the radius of the explosion (R), the time since explosion (t), and the density of air (rho).
To determine the exact points in time in milliseconds, I wrote a python script to make subtitles for the video which show the time in milliseconds. I then loaded the video and subtitles into @VlcMediaPlayer. (with help from superuser.com/questions/9648…)
1/6 Two days ago @NebulaGenomics announced the $299 genome in partnership with @TheBgiGroup. I was curious about the history of this cost so I looked historical price points curated by @genome_gov. Here are some results:
@NebulaGenomics@TheBgiGroup@genome_gov 2/6 While the initial funding to create the first draft of the human genome was $2.7 billion, the actual price to sequence the final draft of the human genome was closer to ~100 million. This point is commonly missed. See here genome.gov/about-genomics…
@NebulaGenomics@TheBgiGroup@genome_gov 3/6 If pricing were to scale with the inverse of Moore’s law, we would be paying $186,061 today for a human genome. That is 622x more expensive than the $299 genome offered by @NebulaGenomics. This reduction in cost, over 20 years, is equivalent a $9.06 cheaper genome per minute.