Abstract
Accurate prognosis represents a critical component in oncology research, enabling personalized treatment planning and optimized health care resource use. Although existing prognostic models demonstrate promising performance on restricted data sets, they remain constrained by two limitations: modality-specific architectural designs and cancer type-specific training paradigms that hinder cross-domain generalization. To address these challenges, the Unified Multimodal Pan-Cancer Survival Network (UMPSNet) was introduced, which integrated histopathology images, genomic expression profiles, and four metadata categories through structured text templates. UMPSNet used the optimal transport-based attention for multimodal feature alignment and a guided mixture of expert mechanisms to address cancer-type distribution shifts. Comprehensive evaluation across 3523 whole slide images (n = 2831) spanning five The Cancer Genome Atlas cohorts demonstrated superior predictive performance (mean concordance index = 0.725), surpassing meticulously designed single-cancer models. Notably, in zero-shot transfer evaluation involving 392 pancreatic adenocarcinoma whole slide images (n = 66) from Peking University Third Hospital, UMPSNet achieved a concordance index of 0.652 without parameter fine-tuning, demonstrating generalization capacity for previously unseen malignancies. Additionally, UMPSNet identified prognostic gene signatures that consistently overlapped with clinically detected mutations (n = 92) while revealing novel gene candidates, validating its clinical relevance and providing complementary insights for precision oncology. Thus, the UMPSNet framework established a new paradigm for multimodal survival analysis by overcoming data heterogeneity and domain shift challenges, thereby providing a clinically adaptable tool for pan-cancer prognostic prediction.