Aditya Malik, Nalini Ratha, et al.
CAI 2024
High throughput sequencing generates vast, high-dimensional data with extreme sparsity and noise. These characteristics pose significant challenges for conventional machine learning algorithms, which struggle to extract biologically meaningful patterns for classifying host health states. We propose a network-informed optimal transport (OT) approach, which quantifies similarities between experimental profiles. Optimal transport (OT) offers a powerful framework to address these challenges by calculating the minimum "cost" of transforming one microbial community profile into another, providing a flexible metric for comparing abundance profiles across disease states. This study systematically investigates different OT-based distance metrics—including unbalanced OT, structured OT and Gromov-Wasserstein (GW) distance—to evaluate their effectiveness in detecting disease-associated biological changes.
We apply these methods to synthetically generated networks as well as clinical datasets. To biologically inform the OT framework, we develop a custom cost function based on phylogenetic distances between features, enhancing the alignment of taxa that are evolutionarily related. This approach leverages computational interaction networks to enhance biological interpretability, enabling robust patient stratification in a disease-agnostic manner.
Aditya Malik, Nalini Ratha, et al.
CAI 2024
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025