Mohammed AlQuraishi Profile picture
MLing biomolecules en route to structural systems biology. Asst Prof of Systems Biology @Columbia. Prev. @Harvard SysBio; @Stanford Genetics, Stats.
Jun 13, 2023 7 tweets 4 min read
We built a new diffusion protein design model named Genie. We preprinted it a while ago (soon after RFDiffusion and Chroma preprints) but kept mum due to embargo. Final ICML version (major update) with code github.com/aqlaboratory/g… and paper here arxiv.org/abs/2301.12485 (1/7) Genie is trained from scratch to generate structures without a pretrained structure prediction model (similar to FrameDiff: arxiv.org/abs/2302.02277 by @json_yim et al., another contemporaneous method) making it extremely light-weight. (2/7)
Jul 24, 2022 10 tweets 2 min read
Last week’s OmegaFold (biorxiv.org/content/10.110…) and ESMFold (biorxiv.org/content/10.110…) present contrasting takes on how to fuse language models (LMs) with structure prediction. A short 🧵1/9 ESMFold is scale-focused, training a much more performant LM (ESM2) but coupling it to unchanged AF2. Definitely in "scale is all you need" mindset. 2/9
Jun 22, 2022 13 tweets 6 min read
We have successfully trained OpenFold from scratch, our trainable PyTorch implementation of AlphaFold2. The new OpenFold (OF) (slightly) outperforms AlphaFold2 (AF2). I believe this is the first publicly available reproduction of AF2. We learned a lot. A🧵1/12 First off: model weights, training code and colab notebook are here github.com/aqlaboratory/o…. We are also making available a training set of 400K unique MSAs & predicted structures for self-distillation. These lives in the Registry of Open Data on AWS registry.opendata.aws/openfold 2/12
Nov 30, 2020 5 tweets 2 min read
CASP14 #s just came out and they’re astounding—DeepMind looks to have solved protein structure prediction. Median GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1Å. I think it's over predictioncenter.org/casp14/zscores… These are for single domains-not whole proteins-and there are a few poor predictions. So corner cases remain but core problem appears solved: 88% of predictions are <4Å, 76% <3Å, 46% <2Å. Unlike last time where there was some competition, this time AF2 was best for 88/97 targets.