Mohammed AlQuraishi Profile picture
MLing biomolecules en route to structural systems biology. Asst Prof of Systems Biology and CS @Columbia. Prev. @Harvard SysBio; @Stanford Genetics, Stats.
Oct 28 11 tweets 4 min read
OpenFold3-preview (OF3p) is out: a sneak peek of our AF3-based structure prediction model. Our aim for OF3 is full AF3-parity for every modality. We now believe we have a clear path towards this goal and are releasing OF3p to enable building in the OF3 ecosystem. More👇 Image OF3p is already quite good. For any modality, it is comparable (or better) to the best existing OSS model for that modality. On RNA, where we spent considerable effort, it is at AF3-parity. RNA is a challenging modality where all models (incl AF3) leave much room for improvement. Image
Jun 13, 2023 7 tweets 4 min read
We built a new diffusion protein design model named Genie. We preprinted it a while ago (soon after RFDiffusion and Chroma preprints) but kept mum due to embargo. Final ICML version (major update) with code github.com/aqlaboratory/g… and paper here arxiv.org/abs/2301.12485 (1/7) Genie is trained from scratch to generate structures without a pretrained structure prediction model (similar to FrameDiff: arxiv.org/abs/2302.02277 by @json_yim et al., another contemporaneous method) making it extremely light-weight. (2/7)
Jul 24, 2022 10 tweets 2 min read
Last week’s OmegaFold (biorxiv.org/content/10.110…) and ESMFold (biorxiv.org/content/10.110…) present contrasting takes on how to fuse language models (LMs) with structure prediction. A short 🧵1/9 ESMFold is scale-focused, training a much more performant LM (ESM2) but coupling it to unchanged AF2. Definitely in "scale is all you need" mindset. 2/9
Jun 22, 2022 13 tweets 6 min read
We have successfully trained OpenFold from scratch, our trainable PyTorch implementation of AlphaFold2. The new OpenFold (OF) (slightly) outperforms AlphaFold2 (AF2). I believe this is the first publicly available reproduction of AF2. We learned a lot. A🧵1/12 First off: model weights, training code and colab notebook are here github.com/aqlaboratory/o…. We are also making available a training set of 400K unique MSAs & predicted structures for self-distillation. These lives in the Registry of Open Data on AWS registry.opendata.aws/openfold 2/12
Nov 30, 2020 5 tweets 2 min read
CASP14 #s just came out and they’re astounding—DeepMind looks to have solved protein structure prediction. Median GDT_TS went from 68.5 (CASP13) to 92.4!!!! Cf. their 2nd best CASP13 struct scored 92.8 (out of 100). Median RMSD is 2.1Å. I think it's over predictioncenter.org/casp14/zscores… These are for single domains-not whole proteins-and there are a few poor predictions. So corner cases remain but core problem appears solved: 88% of predictions are <4Å, 76% <3Å, 46% <2Å. Unlike last time where there was some competition, this time AF2 was best for 88/97 targets.