We compared different RXN classification methods. 📍Using a BERT model borrowed from NLP, we matched the ground truth (Pistachio, @nmsoftware) with an accuracy of 98.2%.
We did not only visualize what was important for the class predictions by looking at the different attention weights...
... but also mapped the chemical reaction space using the embeddings of our RXN BERT classifier (RXN fingerprint):
We investigated different combinations of RXN fingerprints: a) unsupervised, b) hand-crafted, c) supervised, d) a + b merged, and e) b + c merged.
And showed that our BERT RXN fingerprint can be used for efficient nearest neighbor searches in the reaction space without knowing the reaction center or distinguishing between reactants and reagents. Examples:
The similar precursors, products and reaction centers can be recognized even by non-experts.
I had planned to add all my #LINO22 highlights chronologically to my thread but there are just too many. So, I will just cherry-pick a few here.
Starting with the inspirational talk by @ben_list on the importance of catalysis for a more sustainable future.
Related to that the "Catalysis and Green Chemistry" panel discussion with Richard Schrock, Dave MacMillan (@dmac68), Liang Feng (@LiangFeng_chem), Jiangnan Li, Carla Casadevall (@CasadevallCarla) - well done!
Dave MacMillan’s (@dmac68) excellent advice for young group leaders/scientists:
- Be passionate and work on things you are truly excited about
- Be as generous as you can be and treat people with respect
@one_know_wonho: For yield prediction, how are the data labels distributed? Does the dataset also include reactant sets where no reaction happens between them (thus zero yield)?
Yes, the yields are distributed between 0 and 100%. For the Buchwald-Hartwig reactions (science.org/doi/abs/10.112…), the dataset contains more low- than high-yielding reactions. You can find more information in iopscience.iop.org/article/10.108….
2/ The AI model (VQGAN + CLIP) generated most of the image using “enzymatic chemical reactions. green chemistry. advanced unreal engine” as input.
It’s interesting that you can recognise the lab with the blackboards, the floor and the “reactions”.
A major limitation of current deep learning reaction prediction models is stereochemistry. It is not taken into account by graph-neural networks and a weakness of text-based prediction models, like the Molecular Transformer (doi.org/10.1021/acscen…).
How can we improve? 2/N
In this work, we take carbohydrate reactions as an example. Compared to the reactions in patents (avg. 0.4 stereocentres in product), carbohydrate contain multiple stereocentres (avg. >6 in our test set), which make reactivity predictions challenging even for human experts. 3/N
Awesome! All the video recordings of #AMLD2020 are now available on youtube. Check out the ones from the fantastic speakers we had in the #AIMolecularWorld track⬇️