Sina Booeshaghi Profile picture
Sep 18 14 tweets 4 min read Twitter logo Read on Twitter
Single-cell genomics assays are rooted in a handful of of technologies. They require physical isolation of cellular material, molecular barcoding, and library generation. Molecular modes include RNA, DNA (ATAC), protein and more which makes preprocessing challenging. 1/🧵 Image
One challenge of data (pre)processing (distinct from processing) is that multiple data types that must be processed in a manner that minimizes batch effects. The challenge is ensuring that reads generated from assays are consistently catalogued, error-corrected, and counted. 2/🧵
To address this challenge, we present cellatlas, a tool for uniform preprocessing that build on kallisto bustools (kb-python, ) and seqspec ()- a collaboration with @DelaneyKSull and @lpachter.

📖: 3/🧵github.com/pachterlab/kb_…
github.com/IGVF/seqspec/
biorxiv.org/content/10.110…
cellatlas is a command line tool that generates the appropriate commands to uniformly preprocess sc(n)RNAseq data, sc(n)ATACseq data, Multiome data, feature barcoding data, CRISPR (PerturbSeq) data, Spatial (Visium) data, and more. 4/🧵
cellatlas leverages the seqspec specification () to appropriately identify, extract, and correctly handle sequenced elements like barcodes and UMIs. It then selects the correct workflow (standard kallisto bustools, kITE, snATAK) to deploy. 5/🧵 biorxiv.org/content/10.110…
Image
cellatlas is simple to use- supply:
1. sequencing reads
2. a correct seqspec specification
3. genome fasta
4. genome annotation
5. (optional) feature barcodes
and the correct workflow will be generated for you. No more worrying about providing FASTQs in the right order. 6/🧵 Image
cellatlas enables within-assay comparisons. We compare modes of DOGMAseq data (RNA/ATAC/Surface Protein/sample tags) from the same cell generated by R. Duerr/W. Chen. cellatlas allows us to hypothesize about an experimental cause for efficiency tradeoffs in reads/UMIs. 7/🧵 Image
cellatlas also enables b/w assay comparisons- the challenge of which is rooted in lack of uniform preprocessing. Ascribing differences in data quality to wetlab techniques is challenging when preprocessing tools inject unnecessary variability (due to e.g. diff algorithms). 8/🧵
cellatlas solves this challenge with uniform preprocessing. Using cellatlas with 10x Multiome data (PBMCs) and DOGMAseq Multiome data (PBMCs) we find that DOGMAseq appears to be more efficient than 10x Multiome (at the same sequencing depth!) 9/🧵 Image
We anticipate that uniform preprocessing will be useful in the development of new single-cell genomics assays, for example by revealing cross-technology tradeoffs. We also believe that uniform preprocessing will improve reproducibility. 10/🧵
This work builds on the efforts of many people including @pmelsted, @yarbslocin, @DelaneyKSull, @LambdaMoses, @lioscro, Fan Gao, @hjpimentel, @kreldjarn, @JaseGehring, Lauren Liu, @XiChenUoM, and many others. 11/🧵
cellatlas is open source and freely available: . Analysis methods can be found here: . Feedback is welcomed. 12/12github.com/cellatlas/cell…
github.com/pachterlab/BSP…
@pmelsted @DelaneyKSull @LambdaMoses @lioscro @hjpimentel @kreldjarn @JaseGehring @XiChenUoM The tag should say @yarbsalocin

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sina Booeshaghi

Sina Booeshaghi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @sinabooeshaghi

Oct 11, 2022
In a new preprint, @lioscro, @JaseGehring, @lpachter and I describe a method and software (kITE) for quantifying orthogonal barcodes from assays such as Perturb-seq, Clicktagging, TAPSeq, CiteSeq, Multiseq, and 10xFeature Barcoding: 1/ biorxiv.org/content/10.110…
Orthogonal barcoding has become a method of choice for multimodal single cell genomics. For example, multiplexing assays such as Clicktagging rely directly on click chemistry 2/

That assay motivated us to develop kITE. Specifically, we needed to demultiplex ClickTags (developed by @JaseGehring , who initially prototyped the kITE approach) and like others at the time, had to write custom code to do it. 3/

nature.com/articles/s4158…
Read 14 tweets
Aug 4, 2020
Using dimensional analysis I estimate that the energy contained in the awful #Beirut explosion was approximately 12 Terajoules = 3 kilotons of TNT. For reference the "Little Boy" dropped on #Hiroshima was ~13-18 kilotons of TNT. #orderofmagnitudephysics
To estimate this, I used the following relationship between the energy of the explosion (E), the radius of the explosion (R), the time since explosion (t), and the density of air (rho).
To determine the exact points in time in milliseconds, I wrote a python script to make subtitles for the video which show the time in milliseconds. I then loaded the video and subtitles into @VlcMediaPlayer. (with help from superuser.com/questions/9648…)
Read 11 tweets
Feb 20, 2020
1/6 Two days ago @NebulaGenomics announced the $299 genome in partnership with @TheBgiGroup. I was curious about the history of this cost so I looked historical price points curated by @genome_gov. Here are some results:
@NebulaGenomics @TheBgiGroup @genome_gov 2/6 While the initial funding to create the first draft of the human genome was $2.7 billion, the actual price to sequence the final draft of the human genome was closer to ~100 million. This point is commonly missed. See here genome.gov/about-genomics…
@NebulaGenomics @TheBgiGroup @genome_gov 3/6 If pricing were to scale with the inverse of Moore’s law, we would be paying $186,061 today for a human genome. That is 622x more expensive than the $299 genome offered by @NebulaGenomics. This reduction in cost, over 20 years, is equivalent a $9.06 cheaper genome per minute.
Read 6 tweets
Feb 10, 2020
@GoogleColab @satijalab @fabian_theis @pmelsted @lpachter @LambdaMoses @lioscro 2/ We've tried hard to make sure that the Python tutorials == R vignettes, so that language choice does not affect method. But we have not been able to get exact replicability.
@GoogleColab @satijalab @fabian_theis @pmelsted @lpachter @LambdaMoses @lioscro 3/ For example, we've learned that Scanpy flavor="cell_ranger" is consistent with Seurat but flavor=“seurat” is not consistent with Seurat for highly variable gene detection.

See example python notebook: colab.research.google.com/github/pachter…

and example R notebook: colab.research.google.com/github/pachter…
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(