Sergey Ovchinnikov Profile picture
Scientist, Assistant Professor @MITBiology, #FirstGen, ProteinBERTologist, 🇺🇦
Jun 30 8 tweets 3 min read
Weekend project: Comparing ESM3 from @EvoscaleAI to ESM2 and inv_cov. The ultimate test of a protein language models is how well the pairwise dependencies it learns correlate to structure. (1/8) Image Traditional methods approx this signal by taking a multiple sequence alignment of a protein family and computing the inverse covariance matrix. For pLMs we extract it by computing a jacobian over the sequence track (for esm3, structure is masked). (2/8)
Jun 16 9 tweets 4 min read
Towards the end of the presentation I went down a bit of a rabbit hole trying to demonstrate that AF3 may still be learning to invert the convariance matrix, which is needed to extract the coevolution signal from input multiple sequence alignment (MSA) (1/9).
For context, traditional methods like GREMLIN extract coevolution from input MSA. If you make the assumption that data is non-categorical, you can approximate the coevolution signal via inverse-covariance matrix (2/9). arxiv.org/abs/1906.02598
Image
Mar 1 7 tweets 3 min read
I tried running our categorical Jacobian method (for extracting coevolution signal from language models) on Evo from @BrianHie @pdhsu on the 16S rRNA. It appears to pickup on local hairpins 🤓(1/3). Image @BrianHie @pdhsu No strong long-range contacts though... (2/3) Image
Nov 27, 2023 10 tweets 4 min read
A recent preprint from @Lauren_L_Porter shows that it's sometimes possible to sample the alternative conformation of metamorphic proteins by removing the MSA. Though I think this is a very interesting observation, I disagree that coevolution is not used when it is provided. (1/9) https://www.biorxiv.org/content/10.1101/2023.11.21.567977v2 We believe AlphaFold has learned some approximation of an "energy function" and a limited ability to explore. But this is often not enough to find the correct conformation, and often an MSA is required to reduce the search space. (2/9) https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.129.238101
Feb 27, 2023 4 tweets 2 min read
Puzzle: The residue index encodes the position embedding for models like AlphaFold. This residue index is converted into an offset matrix. (1/3) What do you think will happen if this offset matrix is used instead? [answer will be posted later] (2/3)
Feb 26, 2023 5 tweets 3 min read
AlphaFold inverted to hallucinate denovo proteins of up to 600 amino acids in length🤯

(animation below shows the designed protein docked into CryoEM density)

Exciting work with:
@chrisfrank662, @AKhoshouei, Yosta de Stigter, Dominik Schiewitz, @ShihaoFeng18, @hendrik_dietz
Jan 30, 2023 7 tweets 3 min read
We've been working on adding AlphaFold v2.3.1 support to ColabFold. 😎 Here is the notebook for those interested in testing: colab.research.google.com/github/sokrypt… (1/5) The major update is AlphaFold_multimer_v3. This is an updated multimer model from @DeepMind. Initial tests from @milot_mirdita show an improvement over v2. Though, it's unclear whether improvements are due to new params or protocol (run for more recycles, with early stop). (2/5) Image
Dec 10, 2022 7 tweets 3 min read
Some exciting results from CASP15! (1/6) Though Zscores can be a little misleading. If everyone is the same on average, and one group does really well on a particular target compared to everyone else, they HUGE boost. Here are the same data (same order, but plotting average GDT_TS). (2/6)
Dec 6, 2022 11 tweets 5 min read
inspired by some of our work on AF2Rank (showing AF can be used to denoise template inputs) and RFdiffusion, I tried hacking AlphaFold to be a diffusion model for generating backbones. 😎 Only 15 lines of code to implement in ColabDesign 🤓
colab.research.google.com/github/sokrypt… Image
Oct 16, 2022 4 tweets 2 min read
Job-application-procrastination-project:
ProteinMPNN in jax! 😅

GPU=A6000, length=2382, seqs=32
pytorch=3m22s
jax=17.9s
jax=4.46s (vmap)

length=100, seqs=5000
jax=2s

Special thanks to Shihao Feng, @JustasDauparas and @sim0nsays

colab.research.google.com/github/sokrypt…

(1/3) With 10 lines of code you can now design sequences with ProteinMPNN and "evaluate" them with AlphaFold/AF2Rank ( from @jamesproney )
github.com/sokrypton/Cola…
(2/3)
Aug 4, 2022 5 tweets 3 min read
Found an example from @Alexis_Verger that OmegaFold fails on (and has low confidence). But if you hack OmegaFold to use an MSA input, it gets it right (and has high confidence)! @Alexis_Verger
Aug 3, 2022 4 tweets 2 min read
Seems to be working 😅
Now to test the limits on Colab!
colab.research.google.com/github/sokrypt… In the spirit of hacking methods for what they were not trained to do. I added support for chain breaks using the residue index offset trick! 😇
Jun 15, 2022 4 tweets 2 min read
We've (including @thesteinegger @milot_mirdita ) updated ColabFold to use the latest optimized AlphaFold implementation that reduces compile time from ~4.5mins to ~30 seconds! We've confirmed the results are identical for both monomer and multimer predictions. Try it out! (1/2) Here is the Pseudo-code of the changes we made and the effects on compile time. (2/2)
Jun 12, 2022 10 tweets 4 min read
For my latest attempt at introducing proteins to students, I made a Google Colab Notebook that predicts proteins from a single sequence. I asked the students to tweak the sequence to get a helix or two helices or... (1/5)
colab.research.google.com/github/sokrypt… I gave them the following cheat sheet: 😅 (2/5)
Oct 24, 2021 12 tweets 8 min read
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
biorxiv.org/content/10.110…
A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh, @JustasDauparas, @countablyfinite, @keitokiddo, @srush_nlp & @pkoo562 (1/8) Many methods like GREMLIN, MSA_transformer, RosetTTAFold and AlphaFold rely on input MSA generated by non-differentiable methods. (2/8)