Abstract
OBJECTIVE: Therapeutic outcomes after immune checkpoint inhibitors (ICIs) in hepatocellular carcinoma (HCC) are highly heterogeneous. Accurate prognostic assessment is essential for risk stratification and clinical management. This study aimed to develop and validate an interpretable deep-learning survival model, TabNet-Cox, for predicting overall survival (OS) in ICI-treated HCC patients. METHODS: A total of 453 consecutive HCC patients treated with ICIs at Harbin Medical University Cancer Hospital between January 2018 and December 2023 were retrospectively enrolled and randomly assigned to a training cohort (n = 339) and an internal validation cohort (n = 114). An independent external validation cohort of 105 patients was collected from the Second Affiliated Hospital of Harbin Medical University under the same inclusion criteria. Baseline demographic variables, tumor characteristics, pretreatment management categories (surgery, locoregional therapy, or none), and laboratory parameters were used to develop TabNet-Cox. Model performance was assessed under a repeated 5-fold cross-validation protocol and further evaluated in the internal and external cohorts using the concordance index (C-index), AUC, and Brier score. SHapley Additive exPlanations (SHAP) and unsupervised clustering were applied for interpretability and phenotype exploration. Clinical utility was examined using decision curve analysis (DCA) with BCLC stage as the reference. RESULTS: TabNet-Cox showed the best overall performance among the survival models compared, achieving a C-index of 0.79 and an AUC of 0.81 with the lowest Brier score (0.059) in the development setting. In the external validation cohort, TabNet-Cox demonstrated stable discriminative performance, with well-defined ROC curves and good calibration. Using the prespecified risk cut-off, the model effectively stratified patients into distinct risk groups, yielding significantly separated Kaplan-Meier survival curves (P < 0.001). SHAP analysis highlighted AFP, GGT, and LDH as major risk contributors, whereas albumin and lymphocyte count were protective. Unsupervised clustering within high-risk patients suggested two patterns, a tumor burden-dominant phenotype and a liver dysfunction-dominant phenotype, which should be interpreted as hypothesis-generating. CONCLUSION: TabNet-Cox provides an accurate and interpretable framework for OS prediction and risk stratification in ICI-treated HCC using routinely available baseline variables. Its performance was supported by resampling-based evaluation and independent external validation, supporting its potential value for individualized prognostic assessment.