Abstract
Gastric cancer is a leading cause of cancer-related deaths globally. As mortality rates continue to rise, predicting cancer survival using multimodal data-including histopathological images, genomic data, and clinical information-has become increasingly crucial. However, extracting effective predictive features from this complex data has posed challenges for survival analysis due to the high dimensionality and heterogeneity of histopathology images and genomic data. Furthermore, existing methods often lack sufficient interaction between intra- and inter-modal features, significantly impacting model performance. To address these challenges, we developed a deep learning-based multimodal feature fusion model, MultiDeepsurv, designed to predict the survival of gastric cancer patients by integrating histopathological images, clinical data, and gene expression data. Our approach includes a two-branch hybrid network, GLFUnet, which leverages the attention mechanism for enhanced pathology image representation learning. Additionally, we employ a graph convolutional neural network (GCN) to extract features from gene expression data and clinical information. To capture the correlations between different modalities, we utilize the SFusion fusion strategy that employs a self-attention mechanism to learn potential correlations across modalities. Finally, these deeply processed features are fed into Cox regression models for an end-to-end survival analysis. Comprehensive experiments and analyses conducted on a gastric cancer cohort from The Cancer Genome Atlas (TCGA) demonstrate that our proposed MultiDeepsurv model outperforms other methods in terms of prognostic accuracy, with a C-index of 0.806 and an AUC of 0.842.