Have you heard about Revio by @PacBio at #ASHG22? It is such an amazing platform, 4x30-fold HiFi human genomes every 24 hours. Let me show you what it took to get this bioinformatics performance on the instrument. A thread on CCS GPU acceleration and DeepConsensus productization.
The team behind CCS has been working hard for the past years to reduce runtime massively for the initial SQIIe and subsequent releases. We pulled all tricks to get the CPU code as fast as possible, but Revio is a completely different beast. Generating 90Gb HiFi per SMRT Cell 25M!
The polish algo in CCS is an HMM, filling out matrices at its core. The nature of its operations makes an efficient GPU implementation challenging. We’ve ported it completely onto @nvidia GPUs and achieved a 10x speedup over a dual 64c AMD EPYC, easily the fastest HMM on GPU.
CCS draft step has three core algos: mapping, alignment, and draft sequence generation. Every algo and every line has been checked for tech debt and room for optimizations.
For draft sequence generation we are using Sparc by Ye C, Ma Z. (2016). GPU experiments were not that successful, so we’ve concentrated on a CPU rewrite and achieved more than 10x over reference implementation with identical output, ~10µs/ZMW wall time.
Bottlenecks in mapping have been molded into SIMD, hash methods for collecting seed hits carefully benchmarked and improved, enabled reseeding to get better HiFi yield for your low-complexity molecules ... we could have a whole seminar on this 😊
For alignment in CCS, we are using edlib and ksw2, compute intensive algorithms. Though not integrated yet, we’ve implemented the fastest short- and long-read double-affine aligner on GPU, beating every CPU, GPU, and FPGA out there. Stay tuned.
Size does matter. Small uBAMs save you storage $$. Every read has per-base binned QVs, capped at Q40 (binning does not decrease variant calling performance), and MM/ML tags for 5mC predictions. For one run, 39 GiBytes uBAM for 90 Gbases HiFi yield -> 0.43 bytes/base.
All of this made room for a new player: DeepConsensus by @GoogleHealth as part of CCS, running on Revio’s on-instrument compute. With Google’s model improvements to v1.0, a C++ front-end and using @onnxruntime with #TensorRT, 25M on Revio w/ DC is as fast as 8M on SQIIe w/o DC.
Because you will ask the obvious: Given that DeepConsensus is a transformer model and inference on CPU is slow, there’s currently no path forward to push a software update to SQIIe. CCS with integrated DeepConsensus stays a Revio exclusive on-instrument solution.
Revio does not only generate one 30-fold human genome per SMRT Cell 25M, but the data quality is also as good or better than SQIIe. More yield and better variant calling results? Too good to be true? No! Initial DeepVariant results of one Revio 🏆 vs three SQIIe SMRT Cells.
With this fully integrated solution, DeepConsensus improved HiFi reads are the real deal, annotated with HiFi kinetics enabling 5mC on-instrument calling and demultiplexing.
Want to have a look at the first HiFi reads that came off Revio? Check out five HG002/3/4 runs at downloads.pacbcloud.com/public/revio/2…. Each include 5mC annotated HiFi reads, alignments, and small variants + SV VCFs. We also included a new Revio DeepVariant model, so you can reproduce results
One more thing, we’ve improved our demultiplexing tool lima to increase barcode yield from ~97% to up to ~99.5% with similar PPV. Stay tuned, I'm not done yet.
More info on the currently publicly available CCS version at ccs.how