Tweet

Lior Pachter

Jun 6 • 25 tweets • 10 min read

@sinabooeshaghi

The exciting reveal of Ultima Genomics last week was accompanied by the publication of four preprints. Intrigued by the potential of the technology, @sinabooeshaghi & I decided to take a look at the data. A 🧵 about our findings & a preprint we posted: biorxiv.org/content/10.110… 1/

We first looked at the company's own preprint on which the CEO is first author: biorxiv.org/content/10.110…

Unfortunately, no data. No code. There is not even supplementary material, which the authors write "will be made available in the near future." 2/

Without data or code, obviously one cannot check the claims of the company. But in this case one cannot even understand the claims. E.g. the description for Fig. 2e in the Methods is useless without code to explain what was actually done to produce it. 3/

So we looked at preprint #2, which is on whole-genome methylation sequencing from the Snyder lab: biorxiv.org/content/10.110…

This time: "The datasets used and/or analyzed during the current study are available from the corresponding author upon request." 4/

https://twitter.com/ceptional/status/1533567322736435200

I'm tired of this "data available upon request" thing, and I'm not the only one. In the past year I've had to make at least a dozen such requests, and my success rate in obtaining data is even lower than NIH funding rates. 5/

https://twitter.com/ceptional/status/1533567322736435200

So we looked at preprint #3, which published Perturb-seq, some of it done with Ultima: biorxiv.org/content/10.110…

Went to look for the data link in the preprint but only got "Raw sequencing data will be deposited into SRA."

6/

@biorxivpreprint

Protip: when reading a @biorxivpreprint preprint it can pay off to chase the tweet links from the @biorxivpreprint page. The authors of this paper did eventually tweet out a link to some of their data (I leave the task of finding it as an exercise to the reader!) 7/

We did eventually find data linked from a preprint... yay preprint #4 on single-cell RNA-seq!! biorxiv.org/content/10.110…
This was the only preprint with an accession (GSE197452) AND code (github.com/seanken/Compar…) AND Supp. material. So we went with data from this preprint.
8/

Our curiosity was piqued when we read in the abstract that "[Ultima Genomics data]] show comparable results to existing [Illumina] technology" but saw a different story in Extended Data Figure 1: 9/

So we decided to analyze the data ourselves to see what's going on. Ultima data is single-end, and pre-processing it required a tool that can handle the placement of the barcodes, UMIs & cDNA. Modifying kallisto | bustools was easy (thanks modularity!) and we now have a tool. 10/

Ok, so now we were ready to do our own apples-to-apples comparison, except the Illumina data had 55bp of cDNA, vs. 174bp in the Ultima (not clear why the authors didn't just sequence standard 150bp Illumina reads). So we trimmed the Ultima cDNA data to 55bp. 11/

A bit of work but we got an apples-to-apples comparison and to first-order results from the technologies do look similar as claimed. These are the kneeplots: 12/

Next we decided to dig deeper and look at whether there were any genes that differ in the number of counts. There are many. So we picked the top nuclear gene to look at in detail: TMSB4X. It's the 10th most highly expressed gene in the PBMCs assayed. Also an interesting gene. 13/

There was a large differential in the number of counts for this gene (how large you might ask... you'll find out in the next tweet!) The differential left us curious as to why. So we took a deep dive into this gene, including aligning (not just pseudoaligning) reads to it. 14/

Results are shown below. The Ultima error rate is 10x Illumina. Thus 4.6x less reads are aligned. As a result, pseudoalignment helps rescue reads (in fact 2.1x more than with alignment), but not all. With Illumina reads there is no diff. between pseudoalignment and alignment. 15/

This gene is worst than most because it has a TTTTTTTT sequence in it (homopolymer of length 8). The technology is terrible at and around such homopolymers (and even shorter ones). An IGV screenshot shows just how bad. 16/

The indel situation particularly bad. 17/

This matters, because with Ultima there is going to be significant bias on certain genes that may be difficult to account and correct for. The human genome is full of homopolymers. We measured how many... lots! 18/

And BTW, TMSB4X may be a renal cancer biomarker, so getting its abundance right matters .❗️ 19/

As for Ultima Genomics, yes, it has a cool new technology that looks faster (which is important), and maybe provides a useful tradeoff of lower cost / higher error rate than Illumina; perhaps useful for assays like #scRNAseq where only read assignment is needed. We'll see. 20/

But the first $100 genome? Ultima Genomics is not first; there is already one available by Nebula. Now I know... this is only 0.4x coverage (30x is $300), but if Ultima Genomics can't sequence homopolymers then is it really $100? 🤷‍♂️ Also 300/100 = 3 (not 5 or 10). 21/

https://twitter.com/holtjma/status/1531680056707252224

A few final thoughts: the lack of data release by Ultima Genomics is poor form. It's also disappointing to see many researchers hype the company without looking at the reads. Genome data is still not available. 22/

https://twitter.com/holtjma/status/1531680056707252224

@NIH

The level of hype is through the roof. @NIH tweeted out that this tech will "ensure that people from ancestrally diverse backgrounds will benefit equitably". Really? Does @NIH think individuals from ancestrally diverse backgrounds have no homopolymers? 23/

https://twitter.com/genome_gov/status/1533827507643985925

Also, what's with the complete omission of any discussion of BGI / MGI? What am I missing? 👀academic.oup.com/nar/article/49… 24/

Tl;dr Ultima Genomics setting up an SBS company is impressive. The speed advance is great. It may be useful for apps where homopolymers don't matter much. And there are interesting compbio challenges ahead to make it better. But error rates are high & the crazy hype is ☹️. 25/25

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lpachter

Lior Pachter

@lpachter

May 19

@ensembl

Analysis of #scRNAseq requires constant, tedious, interaction with genomics databases. To facilitate querying from @ensembl et al., @NeuroLuebbert developed gget:
biorxiv.org/content/10.110… (code @ github.com/pachterlab/gget).
gget has many uses; a 🧵on the its amazing versatility: 1/

https://twitter.com/ensembl/status/1149633319933374464

gget works from the command line or python. Just `pip install gget`.

Need reference files for your analysis? 2/

https://twitter.com/ensembl/status/1149633319933374464

Simple with `gget ref`...3/

Read 25 tweets

Lior Pachter

@lpachter

May 11

@sinabooeshaghi

The analysis of single-cell RNA-seq data begins with "normalizing" counts. In a preprint with @sinabooeshaghi, @IngileifBryndis & @agalvezmerchan, we examine the assumptions and challenges of normalization, benchmark methods, and motivate solutions: biorxiv.org/content/10.110… 🧵 1/

We weren't particularly interested in studying normalization, but faced a vexing problem related to normalizing feature barcodes. In scouring the literature for solutions to our problem, we became increasingly confused rather than enlightened about how to normalize our data. 2/

@const_ae

We started with the excellent recent review / expository article by @const_ae & @wolfgangkhuber that looks at strengths & weaknesses of many methods: biorxiv.org/content/10.110…. It became clear to us that a central question is how to normalize depth w/ gene count overdispersion. 3/

Read 25 tweets

Lior Pachter

@lpachter

Mar 22

"..antibody-based and lipid-based methods are simple, straightforward and generally applicable to a wide range of single cell applications and platforms, while genetic cell labeling and chemical labeling with oligonucleotides can be more challenging." Huh? genomebiology.biomedcentral.com/articles/10.11…

We have found the exact opposite to be true. nature.com/articles/s4158…

Tagging with chemical oligos does not require design of antibodies to specific proteins. Hence it is essentially universal with respect to organism, which is why it can be used to multiplex, say, jellyfish. science.org/doi/10.1126/sc…

Read 4 tweets

Lior Pachter

@lpachter

Mar 17

@UCBerkeley

When I went on the job market for my first job after I had been a postdoc I applied to only 3 schools where I really wanted to go (why waste people's time?). I got only one job (@UCBerkeley). 1/

https://twitter.com/smgaddis/status/1504073369775271940

I obviously had no other offers, but someone else in my field (computational biology) who applied to a different department did. The chair of my department wrote to the dean and explained that it would be fair to start both of us at the same salary. 2/

The dean wrote back and declined, explaining that "while I agree with you that it would be the right thing to do, in the absence of an outside offer [for Lior] I cannot approve a salary beyond the minimum." I still have the letter. 3/

Read 6 tweets

Lior Pachter

@lpachter

Mar 9

This #covid19 chart from Iceland shows the data from a small country in the North Atlantic, but it tells the story of #covid19 worldwide.🧵1/

Mitigation procedures / lockdowns don't work? Why yes... they do! 2/

Indoor parties before and during Christmas without vaccination or masks aren't a problem? Well yes... they are! 3/

Read 7 tweets

Lior Pachter

@lpachter

Mar 7

@CamilleThomasOF

I recently saw a moving performance of Elgar's cello concerto by @CamilleThomasOF with @PBortolameolli conducting the @LAPhil. I've probably listened to this piece thousands of times and know all the famous recordings, but I'd never heard it live. @CamilleThomasOF was incredible.

@CamilleThomasOF

She will obviously draw comparisons to Jacqueline du Pré, but comparing a live performance to a recording is a fool's errand. What I can say that I heard in @CamilleThomasOF's performance tones, sounds, and ideas that I never knew were in the piece.

@CamilleThomasOF

Elgar's cello concerto was written shortly after World War I, and @CamilleThomasOF's performance against a backdrop of violence that echoes some of the tragedies not only of the Second World War, but also of the Great War, was profound.

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Lior Pachter

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @lpachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?