tbepler Profile picture
Jun 14 9 tweets 4 min read Twitter logo Read on Twitter
Excited to announce PoET, our (@timt1630, @tbepler1) retrieval-augmented generative protein language model that achieves state-of-the-art unsupervised variant function prediction performance on #ProteinGym. #MachineLearning #ProteinML 1/9

arxiv.org/abs/2306.06156 Image
Inspired by evolution, PoET conditions on observed protein sequences to infer fitness constraints and extrapolate a generative distribution of protein sequences. This allows PoET to be focused on any level of homology, from superfamilies to families to subfamilies and beyond. 2/9
The key idea in the design of PoET was to create a transformer that could condition on homologous sequences but did not require aligned inputs. Our solution was to model the generative process of whole protein families as a sequence-of-sequences generative modeling problem. 3/9
But, because the order of sequences in a family is arbitrary, we developed a unique transformer layer to efficiently attend to ordered residues within each sequence, but treat the sequences themselves as an unordered set. 4/9
This was also critical for enabling PoET to extrapolate to context lengths well beyond what we used during training. A PoET model trained with 8k context tokens easily generalizes to 64k context lengths and beyond. 5/9
As a retrieval-augmented language model, PoET is not limited to its training data. It can learn from sequences from any database without retraining. I’m really excited to see what’s possible with creative prompt/context engineering! 6/9
PoET is able to generate high diversity, high fitness variants and is not limited to substitutions. It can be used to generate and score indels as well! 7/9 Image
PoET is now available via early access at OpenProtein.AI! If you’d like to try it out, please fill out the interest form forms.gle/UeD9wDLvdG9LRw…! 8/9
Also, check out these tutorials on using PoET for de novo variant library design (docs.openprotein.ai/poet-tutorial-…) and generating substrate specific thiolases (docs.openprotein.ai/poet-thiolase-…) by Michael Barber! 9/9

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with tbepler

tbepler Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tbepler1

Jun 13
Here, we used #MachineLearning to design high diversity antibody variants with orders of magnitude greater potency than could be found with conventional directed mutagenesis. #ProteinML #AntibodyEngineering

doi.org/10.1038/s41467… @NatureComms 1/4
Our method uses protein embeddings and Bayesian ML to design optimized antibody variant libraries, and we compare directly with other methods in a head-to-head prospective design study. 2/4
It was wonderful to collaborate with Lin Li, Rajmonda Caceres, Matt Walsh, and the rest of the @MITLL, @MIT, and @AAlphaBio teams on this project! 3/4
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(