A recent publication by Dennis Lo et al applied long-read sequencing (LRS) in the prenatal screening (#NIPT) setting. It's a rather unorthodox technology/application pairing, and it's got me scratching my head a bit.
For context, earlier this year, Lo et al published a convolutional neural network ("the HK model") that enabled PacBio LRS devices to read methylation (5mC) across the entire genome with very high fidelity. This is important later.
I'll summarize my main takeaways from the current paper and end with some of my open questions/concerns.
1.
The authors showed the presence of a large amount of long (>500 bp) cell-free #DNA in maternal plasma. We likely have been systematically underestimating the presence of long cfDNA because short-read #NGS is the predominant method used in NIPT.
1. (Cont.)
The fact that long cfDNA is present in these quantities is interesting by itself. While I'm not sure if this discovery will give rise to new diagnostic applications, it will undoubtedly result in new biological learnings. I'm excited about this.
2.
The authors discovered that sufficiently long cfDNA fragments (>1.8 kb) carry enough information that LRS instruments can determine the fragments' tissue-of-origin (TOO) at a single-molecule level. Using the HK Model, they showed a TOO AUC of 0.89.
2. (Cont.)
In English, this means that LRS can tell whether single molecules of DNA came from mom or baby. This could be helpful in the event that the cfDNA fragment harbors an informative mutation and we're trying to figure out whether the baby has inherited it.
3.
The ends of short and long (>500 bp) cfDNA fragments are unique. When I say 'ends', I mean the first and last four letters of each cfDNA fragment. These are called 4-mers and there are 256 unique permutations. See how different they are below:
3. (Cont.)
Okay, so what? Well, the authors hypothesized that changes in the relative abundances of these 4-mers could be used as a biomarker to detect a fairly serious pregnancy complication called preeclampsia. I'll explain the results before discussing preeclampsia.
3. (Cont.)
Based on the authors' classifier, they showed perfect discrimination (AUC = 1) between cases and controls, albeit in a very small patient population (n=20). Obviously this is a very good result, but I'm not resting much on it given the study size.
These were my three main takeaways from this paper (so far), though the authors did also show methods to deduce maternal inheritance and detect monogenic disorders. I'm fuzzier on whether there's a technical leap in these last two areas, so I'll hold off until I know more.
I'll switch gears to talk a bit about some potential advantages and obstacles with using LRS in the NIPT setting. I'd appreciate any and all feedback as I work through these.
First, I'll talk about preeclampsia. You can read some fast facts here:
While the disorder isn't rare, I'm not sure what the cost-utility of the test would be. It seems like the only remedies are low-dose aspirin or giving birth, which may or may not be an option depending upon the stage of pregnancy.
The current diagnostic paradigm seems lackluster (do you have high blood pressure, protein in your urine, or other non-specific symptoms after week 20)? While more accuracy/earlier detection would be great, these non-specific symptoms are SUPER cheap to detect.
Meanwhile, there's an up-and-coming (multianalyte) test being commercialized by Progenity called Preecludia which seems to have set the new standard for performance. Link below:
Another fact to consider is that LRS would add a LOT of cost to a test in the prenatal setting, which as a market is very price-sensitive, with many patient-pay fees hovering below $200. Based on the Lo et al paper, it seems a single sample would require a ~$2000 SMRT cell.
Then again, the method in the paper isn't optimized at all. In fact, most sequence reads were FAR below the optimal for PacBio sequencing (which is ~20kb). The molecules are way over-sequenced, which adds to cost, but only small returns on data quality.
Since the median length for cfDNA eligible for TOO analysis was ~1.8kb, I'd reckon an easy hack would be to re-run this protocol through PacBio's new programmable #concatenation method which stitches together small molecules into big loops:
Even with this, I'm not quite sure about the economics of the preeclampsia application. Zooming out, though, what other applications might extend from these new discoveries. Might they support LRS having a presence in the NIPS/T market? Maybe!
Here's where I may need some outside input. Would a single-molecule method be able to have a lower limit of detection, that is, the ability to generate high-quality data earlier in pregnancy? Could one get away with more shallow coverage?
If the market is really only concerned with trisomies/aneuploidies, is there any real (market) benefit for being able to detect more monogenic disorders at an early stage? I'm not quite convinced this is the case (yet).
While the authors used PacBio, I see no reason why nanopore couldn't also be used here as the technique is fragment length independent. In fact, if the library consists of both short and long pieces of cfDNA, the prep may be easier on nanopore.
Then again, I'm not quite sure how important accuracy would be to this application, but I think it's probably feasible that both flavors of LRS could be used here.
To summarize, this paper shows that:
1. There's long cfDNA in maternal plasma and much more of it than we thought. What was an unknown unknown is now a known unknown.
2. LRS instruments can divine the tissue-of-origin at a single molecule level using methylation.
3. End motifs, especially those of long fragments, could be a potential avenue for biomarker discovery in the prenatal (#NIPT) setting.
Beyond that, this paper is more of a launching pad for further inquiry/investigation (in my opinion).
I wish I knew how important the small and large fragments were/are to the preeclampsia classifer. Could you get away with only building a large fragment library and not wasting PacBio ZMW's on tiny fragments? If so, that makes this more reasonable.
You can get 40 million (~1kb) CCS reads on one SMRT Cell 8M ($2,000) using programmable concatenation. The paper suggests ~2.4M CCS reads / sample, though only ~11% of those are >1kb.
That's 264,000 large cfDNA reads / sample?
So, 264,000 / 40 million = 151 samples / SMRT cell 8M or roughly $15 / sample (consumables only).
That seems reasonable(ish)?
Then again, that's just large fragments and doesn't fit the design of this paper.
Moreover, my understanding is that concatenation is harder and more error-prone with very small fragments. Perhaps this could be optimized for smaller fragments in the future, but anything below 1kb I'm thinking is ineligible to put into a concatenated molecule.
@GenomicsCow Would really appreciate your thorts here re: carrier screening v. NIPS/T. This one isn’t clicking for me.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
@MJLBio@Sanctuary_Bio@Biohazard3737 Sure! I realize I was being a little vague with those statements. Generally, I think you're correct in your interpretation of the importance of P2 (great $/GB, but at a smaller scale) as well as duplex sequencing.
Something that is important to recognize, though ...
@MJLBio@Sanctuary_Bio@Biohazard3737 ... is how product deployment works differently between PacBio and Nanopore, which is partly an artefact of culture and of time in the public markets, in the public markets. I'm not advocating for one over the other with my next statements.
@MJLBio@Sanctuary_Bio@Biohazard3737 PacBio has been a public company for a long time. While the management has changed much since the failed Illumina merger, the familiarity with how to operate as a public company has not.
PacBio is more secretive and only unveils fully built-out commercial products.
I'd like to share my initial reaction to today's Berkeley Lights report. But first, I need to do some housekeeping. I can't comment on stock movements, share financial projections, or debate fair value.
Generally, I respect anyone who's put this much work into a topic. I won't pretend to have a clean rebuttal to every point. In my experience, beyond the hyperbole and hasty generalizations, there is some truth in these types of reports.
I want to soberly appraise those truths.
Also, I'd invite the subject-matter experts waiting in the wings to build off of this thread, add detail, or share their experiences. Ultimately, we're all after the same thing.
I will start with a few concessions and end with a few counterpoints to today's report:
Imagine that a meteor was hurtling through space towards the Earth. Its speed and trajectory indicate that it will destroy the planet in approximately 10 years.
Now, let's say that our best sensors are only ...
... capable of seeing said meteor 1 year in advance. So, 9 years go by and we are blissfully unaware of our impending doom. Then, at the 9-year mark, we detect the meteor and measure our remaining survival time to be just 1 year.
What if I gave you a better sensor? What if this sensor could see the meteor from 10 years away instead of just 1?
How long would our survival time be? While we may have a 10-year lead time instead of a 1-year lead time, the meteor still strikes us on the same day.
As short-read #sequencing (SRS) costs begin to drop again, undoubtedly fueled by a resurgence in competition, I suspect many liquid biopsy providers will add blood-based whole-genome sequencing (WGS) to supplement, or replace, the deep targeted sequencing paradigm.
With a few exceptions, most clinical-stage diagnostic companies build patient-specific panels by sequencing the solid tumor, then downselecting to a few dozen mutations to survey in the bloodstream.
I don't think this approach is going anywhere anytime soon.
However useful, this deep-sequencing approach suffers from several challenges:
1. It requires access to tissue. 2. It requires the construction of patient-specific PCR panels. 3. It requires significant over-sequencing ($$$). 4. It introduces a third layer of error (PCR).
We often discuss how more comprehensive and sensitive techniques improve the diagnostic yield for patients affected by rare genetic diseases. Indeed, yields have improved as we've gone from microarrays to whole genome #sequencing.
However, there's another critical component.
Case-Level Reanalysis (CLR)
By reanalyzing genomic data, as our global knowledge-base grows, we improve diagnostic yields.
We believe the broadest tests should be done first to avoid the need to re-contact and re-accession patient samples.
The economics for both the lab (and patient) change dramatically as well in a 'generate-once-reassess-often' framework. As more is known, variant interpretation may shift from being more manual to more automated.
Still, this is a really hard technological problem.
The widespread adoption of liquid biopsy seems to be 'un-commoditizing' DNA synthesis in the molecular diagnostics industry.
Recall that synthetic DNA probes, molecules that bind and pull a DNA out of solution, are a critical input for liquid biopsy.
Diagnostics companies buy probes to use in their clinical tests, oftentimes in bulk, from a synthetic DNA provider. There's been a prevailing notion recently that DNA providers only can differentiate on the basis of cost or turnaround time.
I think liquid biopsy changes this.
Firstly, a huge technical constraint in liquid biopsy is the availability of cancerous DNA in a tube of blood, which decreases exponentially with tumor size.
Remember that smaller tumors don't leak as much DNA into the bloodstream.