Profile picture
Noel O'Boyle @baoilleach
, 26 tweets, 3 min read Read on Twitter
#11thICCS Johannes Kirchmair Hit Dexter 2.0: machine learning for triaging hits from biochemical assays
People are still taking about PAINS. The editors of various med chem journals teamed up to describe how to id false positive hits and reject them. J. Med. Chem. were the 1st to adopt PAINS as a decision framework to id cmpds that should be tested in detail.
The applicability of PAINS is limited. (P Kenny referenced)
Frequent hitters are not necessarily bad actors and v.v. Some aggregators and reactive cmpds but also true promiscuous cmpds.
Bad actors trigger false assay readouts and often (but not always) cause false positive readouts. Aggregators under v. specific assay conditions. PAINS cmpds cause problems under v. specific assay conditions.
PAINS: 480 SMARTS patterns. Describes their origin. Derived from 100K cmpds screened at high conc with a single screening technology. These cmpds had previously passed a garbage filtered and were screening under detergent-containings conds (may miss reactive cmpds and aggregators
PAINS are not related to frequent-hitters. Only a few (family A) are frequent hitters (emphasised by the original authors).
PAINS should not be used as a hard filter (will lose hits) and their absence does not imply a cmpd's benignity. Presence of PAINS not a problem per se, but may affect its developability.
Other approaches. Similiarity-based approaches such as Aggregator Advisor. Structurally-similar molecules are potential aggregators also. Not intended as a hard filter.
"Badapple". Underappreciated, but very interesting. J Cheminf 2016, 8, 29. A scaffold-based score for the likelihood of a cmpd being a frequent hitter or ....
Our approach. A ML model for pred of frequent hitter. Highlight cmpds for which extra cuation should be taken when reading assay readouts.
Derived from PubChem assay data. Protein clustering first. MW filter, salt filter, element, duplicate filter (InCHI) and quality checks.
How many cmpds have been measured in multiple assays (against diverse targets). ATR is active to test ratio. The number of protein clusters for which a cmpd was measured as active versus total no of clusters on which tested.
Very few cmpds that are highly promiscuous . Shows histogram of dataset percent versus ATR.
NP vs P vs HP (non-promiscuous to highly prom). Created thresholds for these based on the mean ATR, mean+1sigma, mean+3sigma. 20% are P, 3% are HP.
Dataset is really diverse. 399 protein clusters from 429 originally.
HP cmpds have log P higher by 1 unit. Less flexiblity, and higher ratio of arom atoms to aliph atoms.
Model devel. Tried MACCS and Morgan - went with Morgan. Went with extremely randomized trees versus RF or SVM. Optimized hyperparams with grid search. SMOTE - synthetic minority oversampling. scikit-learn and RDKit.
Shows performance of classifier for discriminating different classes of promiscuoity. 10-fold CV. Pretty well: AUC > 0.9, MCC = 0.61. (ed: what's MCC?) Independent test set with molecules that are not similar.
More testing of model. Looked at effect of similarity of molecules to training set. They fall off at ... 0.4 Tanimoto (? I think)
More testing: Dark chemical matter (DCM) dataset. Over 80 or 90% are correctly classified as non-promiscuous. Also tested on Enamine HTS collection. Also aggregators dataset (John Irwin).
Most surprising results was the test on approved results on DrugBank. More predicated to be HP in DrugBank compared to aggregators!
Hit DexTer 2.0 website just went live. hitdexter2.zbh.uni-hamburg.de
GSK just published 15 most noisiest approved drug in their assays. SLAS discovery 2018.
The website gives a heatmap against various tests. Hit Dexter just gets one of the GSK molecules wrong. That molecule is very far away from the training set data (information which is reported in the webpage).
We believe Hit Dexter is able to predict freq hitters with high accuracy. Not intended as a hard filter. Help in design of screening libs, hit triage and follow-up, and id true prom cmpds. Validation is in progress on a large proprietary dataset.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Noel O'Boyle
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!