Today, we announce the successful editing of DNA in human cells with gene editors fully designed with AI. Not only that, we've decided to freely release the molecules under the @ProfluentBio OpenCRISPR initiative.
Lots to unpack👇
AI has become increasingly pervasive in our daily lives from how we sift through information, produce content, and interact with the world. This marks a new chapter where AI is used to alter the fundamental blueprint of who we are - our DNA.
We were immediately drawn to gene editing due to the pressing societal needs, potential for one-and-done cures to disease, and the scientific challenge + complex biology involving protein, RNA, and DNA.
Our LLMs were trained on massive scale sequence and biological context to generate millions of diverse CRISPR-like proteins that do not occur in nature, thereby exponentially expanding virtually all known CRISPR families at-will.
We then focus on type II effector complexes, generating cas9-like proteins and gRNAs. These proteins are hundreds of mutations away from anything in nature.
We then characterized our generations in the wet lab and found that the AI-designed gene editors show comparable or improved activity and specificity relative to SpCas9, the prototypical gene editing effector. More characterization is underway but we're already impressed.
We also created an AI-designed base editor which exhibited really exciting performance in precise A->G edits.
The results point to a future where AI precisely designs what is needed to create a range of bespoke cures for disease. There is still much to build to achieve this vision. To spur innovation and democratization, we are freely releasing OpenCRISPR-1. Try it out!
This was truly a team effort across all disciplines of the company. @jeffruffolo SNayfach JGallagher @AadyotB JBeazer RHussain JRuss JYip EHill @MartinPacesa @alexjmeeske PCameron and the broader Profluent team. If you want to build with us, join. We’re hiring.
AFAIK, it's the first crystal structure of a functional #protein fully designed by #AI
A milestone in our quest to use language models to generate proteins that are unseen in nature & can function well in the real-world. Read below👇
Proteins do everything in life. They're complex molecules and the workhorses for almost all of biology.
Nature has evolved proteins over billions of years. But instead of relying on natural evolution, what if we could take control and design proteins ourselves from scratch?
We look to artificial intelligence (AI) for help. In particular, we've seen the usage of language models to controllably generate realistic text in #NLProc.
In our work, we've developed some powerful language models to learn from evolution to generate protein sequences.
#Alphafold by #deepmind used solid interdisciplinary intuitions for algorithm/model design. It wasn't just a rinse-and-repeat machine learning exercise. Details on methods are limited, but here's my best interpretation (+some predictions) so far: [1/n]
Protein sequence databases provide us samples that have defacto passed the fitness test of evolution and are information-rich. "Genetics search" is a retrieval step to find nearest-neighbors as defined by sequence alignment. Why do we need nearest-neighbors (NNs), you ask?
There's a neat principle/intuition called coevolution that can help explain. The mutational variance observed can give clues to protein structure and function. Read more here: gremlin.bakerlab.org/gremlin_faq.php