Tweet

Connor W. Coley

Feb 13 • 5 tweets • 2 min read

@Patterns_CP

(1/5) The latest from our group: Jenna Fromer's overview of computer-aided multi-objective optimization in small molecule discovery is now online & open access @Patterns_CP | doi.org/10.1016/j.patt… #compchem

(2/5) Our focus here is on Pareto optimization. Pareto optimization introduces additional algorithmic complexities, but reveals more information about the trade-offs between objectives and is more robust than scalarization approaches

(3/5) We highlight the extensions from single-objective Bayesian optimization to multi-objective Bayesian optimization when choosing molecules from a discrete library. The primary difference is the definition of the acquisition function, with a few options listed in the fig above

(4/5) We also describe the main categories of approaches in multi-objective generative design, _most of which_ follow the paradigm of "iterative distribution learning", & illustrate them through a few case studies

(5/5) Molecular design is fundamentally a multi-objective problem. We hope this will be a useful reference for folks looking to get into computer-aided molecular design and/or move away from scalarization

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @cwcoley

Connor W. Coley

@cwcoley

Jul 14, 2022

@samgoldman19

1/ Learning compound-protein interactions (CPI) w/ sequence & compound alone is tantalizing, but time and time again, CPI models fail to beat simple baselines. Does this paper do so successfully? An analysis from @samgoldman19 :

https://twitter.com/KevinKaichuang/status/1543942451848790018

2/ The short answer is no, this model also fails to outperform a simple nearest neighbor baseline. The metabolic models presented in the paper are likely independently valuable, but are *not* enabled by deep learning

3/ Using the same exact splits, we made predictions with a KNN model by a weighted average of sequence and substrate distance. We perfectly match DLKCat performance on the test set (SI 5A,B). See our gist here

gist.github.com/samgoldman97/8…

Read 10 tweets

Share this page!

Connor W. Coley

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @cwcoley

Connor W. Coley

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!