Abstract
INTRODUCTION AND AIMS: Distant metastasis (DM) in tongue squamous cell carcinoma is associated with poor prognosis. However, reliable tools for immediate postoperative risk prediction remain lacking. This study aimed to develop and validate an interpretable machine learning (ML) model for early risk stratification of DM. METHODS: This study included 752 patients from Sun Yat-sen Memorial Hospital as the model development cohort, while 234 patients from Nanfang Hospital and 105 patients from the HANCOCK database constituted external validation cohorts 1 and 2. Variables were selected using 3 feature selection methods and 6 ML models were developed. Model performance was evaluated using metrics including the area under the receiver operating characteristic curve and interpretability analysis was conducted using Shapley Additive Explanations. RESULTS: A total of 1091 patients were included and 7 key predictors were ultimately identified, including histological grade, lymphovascular invasion, number of regional lymph node metastases, maximum tumour diameter, depth of invasion, neutrophil-to-lymphocyte ratio and monocyte-to-lymphocyte ratio. The Elastic Net model demonstrated the best performance, with area under the curves of 0.935 (95% CI 0.882-0.988) in the internal validation cohort, 0.889 (95% CI 0.800-0.978) in external validation cohort 1 and 0.905 (95% CI 0.808-1.000) in external validation cohort 2. The median time to DM was 11.9 months. The model enabled immediate postoperative risk stratification. Shapley Additive Explanations analysis enhanced model interpretability and an online prediction platform was developed for clinical application (https://nfyy-stomatology-dept.shinyapps.io/Predict-DM-of-TSCC/). CONCLUSIONS: This study developed and validated an interpretable ML model using multicentre real-world data for immediate postoperative DM risk stratification. The model provides a reliable and clinically applicable tool to support individualised patient management. CLINICAL RELEVANCE: The model enables immediate postoperative identification of high-risk patients, providing approximately 12 months of earlier risk recognition compared with routine imaging.