Filippo Utro, Aritra Bose, et al.
ISMB 2024
Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build trustworthy models. We considered the effect of different design choices in the development of peptide bioactivity binary predictors and found that the choice of negative peptides and the use of homology-based partitioning strategies when constructing the evaluation set have a significant impact on perceived model performance providing more realistic estimation of the performance of the model when exposed to new data. We also show that the use of protein language models to generate peptide representations can both simplify the computational pipelines and improve model performance, and that state-of-the-art protein language models perform similarly regardless of size or architecture. Finally, we integrate these results into an easy-to-use AutoML tool to support the development of new robust predictive models for peptide bioactivity by biologist without a strong machine learning expertise. Source code, documentation, and data are available at \url{https://github.com/IBM/AutoPeptideML} and a dedicated web-server at \url{http://peptide.ucd.ie/AutoPeptideML}.
Filippo Utro, Aritra Bose, et al.
ISMB 2024
Jennifer Kelly, Ashley Evans, et al.
ISMB 2025
Raúl Fernández Díaz, Lam Thanh Hoang, et al.
ISMB 2024
Tao Cai, Yifei Li, et al.
MRS Fall Meeting 2024