Many methods like GREMLIN, MSA_transformer, RosetTTAFold and AlphaFold rely on input MSA generated by non-differentiable methods. (2/8)
We ask the question, what if we make the red arrow differentiable and optimize end-to-end. (3/8)
To accomplish this, we implement a differentiable alignment module (LAM). More specifically a vectorized/ striped smith-waterman via #JAX that is extremely fast (4/8)
Given AlphaFold and LAM are conveniently implemented in #JAX, as a proof-of-concept, we backprop through AlphaFold and LAM to maximize the confidence metrics (pLDDT and pAE) (5/8)
Maximizing pLDDT (and potentially "learning" a more optimal MSA) often increases structure prediction over our initial input MSAs. (6/8)
LAM also allows us to convert GREMLIN into SMURF (Smooth Markov Unaligned Random Field) that simultaneously learns an MSA, coevolution and conservation of a given rna/protein family. (7/8)
Learning the MSA+Coevolution end-to-end matches and sometimes exceeds the performance of precomputed MSA on proteins and RNA for task of contact prediction. (8/8)
We'll make the code public in a day or two. The owner of our shared GitHub account is currently traveling. 😂
@jakevdp Oops! Thanks to @thesteinegger for pointing out we had actually implemented an "anti-diagonal" not a "striped" vectorization of smith-waterman.
First described by Wozniak et al. Using video-oriented instructions to speed up sequence comparison. (1997)