Abstract
Targeting programmed cell death protein 1 (PD-1) and cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) with immune checkpoint inhibitors (ICIs) has improved survival across multiple cancer types, but the variability in patient response highlights the need for better predictive biomarkers. Existing studies rely on taxonomic abundance derived from reference genome databases, limiting the discovery and functional interpretation of uncharacterized microbes. Here, we integrated metagenomic data from multiple ICI-treated cohorts spanning diverse cancer types and geographic regions and developed a deep learning model, named BioP-VAE, that incorporates biological prior knowledge via protein sequence embeddings and uses gene-level microbial abundance features as input. Gene-level microbial abundance outperformed taxonomy abundance in predicting both ICI response and 12-month progression-free survival (PFS). In patients receiving combination immune checkpoint blockade (CICB), BioP-VAE achieved a mean AUC of 0.89 in intracohort and 0.88 in cross-cohort evaluation. Notably, in the monotherapy-treated intracohorts, BioP-VAE achieved a mean AUC of 0.97. Feature attribution analysis revealed key microbial genes. Additionally, we identified distinct predictive microbial signatures via age-stratified analysis, suggesting that host age may modulate microbiome‒immune interactions. Importantly, this is the first large-scale study to evaluate gene-level microbial abundance features for ICI response prediction across multiple cancer types by deep learning. Our findings demonstrate that incorporating biological prior knowledge into deep learning models can improve the discovery of microbial biomarkers that can be generalized across cancer types and treatment settings, offering a novel strategy for patient stratification in immunotherapy.