Latest Twitter Threads by @sokrypton on Thread Reader App

Jun 30, 2024 • 8 tweets • 3 min read

Weekend project: Comparing ESM3 from @EvoscaleAI to ESM2 and inv_cov. The ultimate test of a protein language models is how well the pairwise dependencies it learns correlate to structure. (1/8)

Traditional methods approx this signal by taking a multiple sequence alignment of a protein family and computing the inverse covariance matrix. For pLMs we extract it by computing a jacobian over the sequence track (for esm3, structure is masked). (2/8)

https://x.com/sokrypton/status/1752914756250259608

Jun 16, 2024 • 9 tweets • 4 min read

Towards the end of the presentation I went down a bit of a rabbit hole trying to demonstrate that AF3 may still be learning to invert the convariance matrix, which is needed to extract the coevolution signal from input multiple sequence alignment (MSA) (1/9).

https://twitter.com/ProteinBoston/status/1802093052216869104

For context, traditional methods like GREMLIN extract coevolution from input MSA. If you make the assumption that data is non-categorical, you can approximate the coevolution signal via inverse-covariance matrix (2/9). arxiv.org/abs/1906.02598

Mar 1, 2024 • 7 tweets • 3 min read

I tried running our categorical Jacobian method (for extracting coevolution signal from language models) on Evo from @BrianHie @pdhsu on the 16S rRNA. It appears to pickup on local hairpins 🤓(1/3).

@BrianHie @pdhsu No strong long-range contacts though... (2/3)

Nov 27, 2023 • 10 tweets • 4 min read

A recent preprint from @Lauren_L_Porter shows that it's sometimes possible to sample the alternative conformation of metamorphic proteins by removing the MSA. Though I think this is a very interesting observation, I disagree that coevolution is not used when it is provided. (1/9)

We believe AlphaFold has learned some approximation of an "energy function" and a limited ability to explore. But this is often not enough to find the correct conformation, and often an MSA is required to reduce the search space. (2/9)

Feb 27, 2023 • 4 tweets • 2 min read

Puzzle: The residue index encodes the position embedding for models like AlphaFold. This residue index is converted into an offset matrix. (1/3)

What do you think will happen if this offset matrix is used instead? [answer will be posted later] (2/3)

Feb 26, 2023 • 5 tweets • 3 min read

AlphaFold inverted to hallucinate denovo proteins of up to 600 amino acids in length🤯

(animation below shows the designed protein docked into CryoEM density)

Exciting work with:
@chrisfrank662, @AKhoshouei, Yosta de Stigter, Dominik Schiewitz, @ShihaoFeng18, @hendrik_dietz

https://twitter.com/hendrik_dietz/status/1629749217399808001

Jan 30, 2023 • 7 tweets • 3 min read

We've been working on adding AlphaFold v2.3.1 support to ColabFold. 😎 Here is the notebook for those interested in testing: colab.research.google.com/github/sokrypt… (1/5) The major update is AlphaFold_multimer_v3. This is an updated multimer model from @DeepMind. Initial tests from @milot_mirdita show an improvement over v2. Though, it's unclear whether improvements are due to new params or protocol (run for more recycles, with early stop). (2/5)

Dec 10, 2022 • 7 tweets • 3 min read

Some exciting results from CASP15! (1/6)

https://twitter.com/MartinPacesa/status/1601511762356228096

Though Zscores can be a little misleading. If everyone is the same on average, and one group does really well on a particular target compared to everyone else, they HUGE boost. Here are the same data (same order, but plotting average GDT_TS). (2/6)

Dec 6, 2022 • 11 tweets • 5 min read

inspired by some of our work on AF2Rank (showing AF can be used to denoise template inputs) and RFdiffusion, I tried hacking AlphaFold to be a diffusion model for generating backbones. 😎

Only 15 lines of code to implement in ColabDesign 🤓
colab.research.google.com/github/sokrypt…

Oct 16, 2022 • 4 tweets • 2 min read

Job-application-procrastination-project:
ProteinMPNN in jax! 😅

GPU=A6000, length=2382, seqs=32
pytorch=3m22s
jax=17.9s
jax=4.46s (vmap)

length=100, seqs=5000
jax=2s

Special thanks to Shihao Feng, @JustasDauparas and @sim0nsays

colab.research.google.com/github/sokrypt…

(1/3) With 10 lines of code you can now design sequences with ProteinMPNN and "evaluate" them with AlphaFold/AF2Rank ( from @jamesproney )
github.com/sokrypton/Cola…
(2/3)

Aug 4, 2022 • 5 tweets • 3 min read

Found an example from @Alexis_Verger that OmegaFold fails on (and has low confidence). But if you hack OmegaFold to use an MSA input, it gets it right (and has high confidence)!

https://twitter.com/Alexis_Verger/status/1555134615328899072

@Alexis_Verger

Aug 3, 2022 • 4 tweets • 2 min read

Seems to be working 😅
Now to test the limits on Colab!
colab.research.google.com/github/sokrypt…

https://twitter.com/peng_illinois/status/1554831291794657282

In the spirit of hacking methods for what they were not trained to do. I added support for chain breaks using the residue index offset trick! 😇

Jun 15, 2022 • 4 tweets • 2 min read

We've (including @thesteinegger @milot_mirdita ) updated ColabFold to use the latest optimized AlphaFold implementation that reduces compile time from ~4.5mins to ~30 seconds! We've confirmed the results are identical for both monomer and multimer predictions. Try it out! (1/2)

Here is the Pseudo-code of the changes we made and the effects on compile time. (2/2)

Jun 12, 2022 • 10 tweets • 4 min read

For my latest attempt at introducing proteins to students, I made a Google Colab Notebook that predicts proteins from a single sequence. I asked the students to tweak the sequence to get a helix or two helices or... (1/5)
colab.research.google.com/github/sokrypt…

I gave them the following cheat sheet: 😅 (2/5)

Oct 24, 2021 • 12 tweets • 8 min read

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
biorxiv.org/content/10.110…
A fun collaboration with Samantha Petti, Nicholas Bhattacharya, @proteinrosh, @JustasDauparas, @countablyfinite, @keitokiddo, @srush_nlp & @pkoo562 (1/8)

Many methods like GREMLIN, MSA_transformer, RosetTTAFold and AlphaFold rely on input MSA generated by non-differentiable methods. (2/8)

Share this page!

Enter URL or ID to Unroll