First, the metrics are RMSDs based on aligning the C/N/CA/CB atoms across the chain, then calculating the RMSD across a region. i.e. align every residue of the VH, then calculate RMSD across CDRH3, or CDRH1, etc. This is on ~35 antibodies of the ImmuneBuilder test set (2/5)
ESMFold's CDRH3 accuracies are better than what I expected. Where it's let down is on the "canonical" CDRs. It would've been nice to compare the VH-VL orientations and talk about how ImmuneBuilder doesn't generate D-amino acids, etc. (3/5)
However, antibody-specific tools like ImmuneBuilder tend to -not- model the constant domains, though they are important for binding and function pnas.org/doi/pdf/10.107…; ESMFold can model constant domains, but not the full antibody tail (4/5)
While it's tempting to look at -one- tool as the "top" solution for every problem, again we have to remind ourselves that there are other biologically relevant use cases that only "suboptimal" tools can solve. We really need a biologically-relevant competition soon! (5/5)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
First, it's pretty crazy we even have antibody-specific tools, since #AlphaFold2, #ESMFold, #OmegaFold, all do a decent job at antibody modelling. However, antibody-specific tools have -some- feature that's necessary (e.g. being MSA-free) (2/6)
The demand is likely due to interest from pharma & biotech, but we don't have anywhere near the same level of interest for other polymorphic proteins like TCRs and MHCs (🤔). Regardless, with such interest, I think an antibody-specific CASP should be resurrected! (3/6)
Context: a single-chain Fv (scFv) is an antibody construct whose heavy and light chains are linked. It's not the conventional "Y" shape molecule, and is useful for engineering / phage display, etc. See @AlissaHummer's post blopig.com/blog/2021/07/a… (2/5)
Thermostability (measured by TS50, the temperature when scFv loses binding) is weakly predicted by 0-shot and fine-tuning via transformers (ESM-1v + ESM-1b). CNNs using sequence and structural (energy) convolutions perform better (?) [hard to tell, sorry!🙈] (3/5)
Predicting Ab-Ag interactions is a sub-problem of the protein-protein interaction problem. There are many facets to consider here, including but not limited to, identifying the correct antigen (let alone the correct epitope), the correct paratope, orientation, etc (2/5)
@antibodymap's team show first that true Ab-Ag pairs (i.e. those where we know the Ab binds antigen) and false Ab-Ag pairs (i.e. Ag was randomly given to an Ab), the pIDDT scores are incomparable, suggesting score-based discrimination is HARD. (3/5)
Really excited to announce that AntiBERTa is now published in @Patterns_CP! Here we describe a transformer model that demonstrates understanding of antibody sequences 🧵 (1/6)
We pre-train a transformer model based on RoBERTa. We exclusively use full-length antibody/B-cell receptor sequences using the MLM objective. Other similar transformers FYI include BioPhi (@prihodad), ABLang (@HegelundOlsen), AntiBERTy (@jeffruffolo) (2/6)
We show that the embeddings pick up nuanced features of BCR/antibody sequences. For example, V gene usage mutational load, and remarkably, B cell provenance. This is all done in a zero-shot setting, i.e. none of these labels were provided during pre-training. (3/6)