A new framework for evaluating machine learning in biochemistry and its application for small molecules and peptides
Abstract
We are excited to propose a talk discussing the importance of developing new frameworks for improving the reliability of machine learning evaluation for drug discovery and the risks associated with the traditional approaches. This talk will cover aspects of the work from our lab that have been published as a pre-print or original research article in Bioinformatics that offer innovative solutions to critical challenges in AI for drug discovery, making them highly relevant for this conference. We start by considering the longstanding issue of model generalization to out-of-distribution data in machine learning—a crucial hurdle for advancing scientific discovery, where models must generalize to new molecules. Existing methods for splitting data into training and testing sets, such as temporal, sequence identity, or scaffold-based criteria, offer little guidance on the optimal approach. To overcome this, we developed AU-GOOD, a novel metric that quantifies model performance under increasing dissimilarity between train and test sets. This metric is versatile, applicable to a wide range of biochemical entities, and paired with a new partitioning algorithm for more rigorous model testing. It also allows for estimating future model performance to specific target deployment distributions. Recognizing the wide range of similarity functions used in biochemistry, we propose criteria to guide the selection of the most appropriate metric for partitioning. An additional problem for reliable and robust evaluation is that non-experienced researchers might, inadvertently, fall into malpractices. We explored how to mitigate this problem by automating key steps in the machine learning lifecycle, which has the added benefit of democratizing predictive modeling for experimental scientists, while improving trustworthiness through rigorous reporting and reproducibility. We demonstrate these ideas on the specific task of peptide bioactivity prediction—a vital yet underexplored area in pharmaceutical drug discovery. Overall, this work discusses how to improve the trustworthiness of machine learning property predictions for two different modalities with great pharmaceutical importance: small molecules and peptides; from the lens of automation and robust evaluation of the generalisation capabilities of the models.
[1] Fernández-Díaz R, Hoang TL, Lopez V, Shields DC. A new framework for evaluating model out-of-distribution generalisation for the biochemical domain. In Proceedings: The Thirteenth International Conference on Learning Representations. 2025, https://openreview.net/forum?id=qFZnAC4GHR. [2] Fernández-Díaz R, Cossio-Pérez R, Agoni C, Hoang TL, Lopez V, Shields DC. AutoPeptideML: A study on how to build more trustworthy peptide bioactivity predictors. Bioinformatics. 2024 Sep;40(9).