Our zero-shot generative AI framework. The model is programmed with a target antigen structure and a chosen antibody scaffold sequence.
Antibodies are then generated de novo.
Note: all proteins binding the target (or its homologs) were removed from the training set.
We validate generated antibodies experimentally in the wet-lab.
Our platform is extremely high throughput. We can test ~3 million unique AI-generated designs each week.
A cycle of [AI ➡️ lab ➡️ data] takes just 6 weeks.
Example AI-generated cancer drug leads. Our model generates antibodies with stronger affinities than a highly optimized therapeutic drug.
The binders come straight out of the model.
In some cases, >90% of the CDR3 region has changed.
Generative models have a tendency to memorize the training set. However, our model generates HCDR3s that are very different than those observed during training (on average, 5/13 positions are changed).
The sequences also differ from known antibodies in massive databases.
We predict structures of the discovered antibodies, finding that their conformations are highly flexible.
However, the model still learns to place key residues in appropriate positions, likely mediating binding against the epitope.
We're confident in our experimental platform, which can now screen millions of unique AI-generated designs in a few weeks.
To that end, we've opened the data behind this manuscript, releasing the full list of sequences and their measured affinities.
Enjoy! 🤓
We're just getting started. De novo antibody design will unlock a new generation of better therapeutics, with higher probability of success.
We are leveraging this tech to get antibodies into the clinic in just 18-24 months, down from six years.
Special thanks to the dream team for pulling this off 🌟
Not to mention the 200+ unlimiters working to make this a reality.
We have a dozen open roles for AI scientists. Come join us in transforming drug creation w/ Generative AI 🚀
Excited to share an update to our work on evolutionary-scale modeling (ESM)! Over the past year, we rewrote our paper with better pretraining and downstream models, leading to state-of-the-art results across multiple benchmarks. (1/8) biorxiv.org/content/10.110…
Last year, we showed that Transformer language models learn intrinsic properties of proteins from sequences. But on quantitative benchmarks, these models did not improve over alignment-based methods, as shown by @roshan_m_rao, et al in TAPE.😵(2/8)
In this revision, we discovered a major advantage from data diversity. On the TAPE benchmark, our new models now perform similarly to or better than alignment-based methods. 🎉 (3/8)