End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
biorxiv.org/content/10.110…
A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh, @JustasDauparas, @countablyfinite, @keitokiddo, @srush_nlp & @pkoo562 (1/8)
Many methods like GREMLIN, MSA_transformer, RosetTTAFold and AlphaFold rely on input MSA generated by non-differentiable methods. (2/8)
We ask the question, what if we make the red arrow differentiable and optimize end-to-end. (3/8)
To accomplish this, we implement a differentiable alignment module (LAM). More specifically a vectorized/ striped smith-waterman via #JAX that is extremely fast (4/8)

Special thanks to @jakevdp for #JAX help! 😎
github.com/google/jax/dis…
Given AlphaFold and LAM are conveniently implemented in #JAX, as a proof-of-concept, we backprop through AlphaFold and LAM to maximize the confidence metrics (pLDDT and pAE) (5/8)
Maximizing pLDDT (and potentially "learning" a more optimal MSA) often increases structure prediction over our initial input MSAs. (6/8)
LAM also allows us to convert GREMLIN into SMURF (Smooth Markov Unaligned Random Field) that simultaneously learns an MSA, coevolution and conservation of a given rna/protein family. (7/8)
Learning the MSA+Coevolution end-to-end matches and sometimes exceeds the performance of precomputed MSA on proteins and RNA for task of contact prediction. (8/8)
We'll make the code public in a day or two. The owner of our shared GitHub account is currently traveling. 😂
The source code is now public: 👀
github.com/spetti/SMURF
@jakevdp Note: We are not the first to implement a differentiable alignment module in bio.

Previous Implementations:
bmcbioinformatics.biomedcentral.com/articles/10.11…

Most recent iterations:
pytorch-struct:
github.com/harvardnlp/pyt…

deepblast (cuda)
biorxiv.org/content/10.110…

julia:
live.juliacon.org/talk/QB8EC8
@jakevdp Oops! Thanks to @thesteinegger for pointing out we had actually implemented an "anti-diagonal" not a "striped" vectorization of smith-waterman.

First described by Wozniak et al. Using video-oriented instructions to speed up sequence comparison. (1997)

bmcbioinformatics.biomedcentral.com/articles/10.11…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sergey Ovchinnikov

Sergey Ovchinnikov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(