Really excited to announce that AntiBERTa is now published in @Patterns_CP! Here we describe a transformer model that demonstrates understanding of antibody sequences 🧵 (1/6)
#machinelearning #antibodies #drugdiscovery
cell.com/patterns/fullt…
We pre-train a transformer model based on RoBERTa. We exclusively use full-length antibody/B-cell receptor sequences using the MLM objective. Other similar transformers FYI include BioPhi (@prihodad), ABLang (@HegelundOlsen), AntiBERTy (@jeffruffolo) (2/6)
We show that the embeddings pick up nuanced features of BCR/antibody sequences. For example, V gene usage mutational load, and remarkably, B cell provenance. This is all done in a zero-shot setting, i.e. none of these labels were provided during pre-training. (3/6)
Transformers are powered by self-attention and AntiBERTa is no exception. We see that the self-attention maps correlate broadly to positions of contact. While not perfect, AntiBERTa does seem to understand some pairwise dependencies (4/6)
Finally, we fine-tune the model for paratope prediction and show that it can achieve SOTA performance. This helps us think about novel ways in which we can investigate convergence in repertoire datasets, such as using Paratyping (@EveRichardson20). (5/6)
All in all, a huge thanks to the @alchemabtx team, and I want to particularly thank our stellar head of tech @jakegalson, along with @lauramitch29, @jhrf and @all_your_bayes. (6/6)
@alchemabtx @jakegalson @lauramitch29 @jhrf @all_your_bayes Oh, and reviewers #1-3, thank you all! You made the manuscript better.
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
