Abstract
Surgical resection is the primary curative treatment for intrahepatic cholangiocarcinoma (ICC), yet high postoperative recurrence rates pose a significant challenge. We developed an interpretable, transformer-based deep-learning pipeline that integrates multimodal data-including clinical variables, radiomic features, and whole-slide pathology images-by fusing a pre-trained encoder with a transformer network. To biologically validate our model, we leveraged spatial transcriptomics and proteomics to decipher the attention mechanisms underlying its predictions. It demonstrated robust performance in predicting 2-year overall survival, with area under the curve (AUC) values of 0.952 (95% CI: 0.909-0.983), 0.924 (95% CI: 0.804-1.000), and 0.924 (95% CI: 0.828-0.993) in three independent validation cohorts. Interrogation via spatial multi-omics revealed that the model's attention was preferentially focused on regions histologically and molecularly associated with tumor invasion and aggressive behavior. We present a novel, interpretable multimodal deep-learning framework that achieves superior postoperative risk stratification for ICC patients.