Abstract
BACKGROUND: Primary gastrointestinal melanoma (PGIM) is a rare and highly aggressive malignancy with a poor prognosis, and accurate survival prediction models are currently lacking. This study aims to analyze the clinical characteristics of PGIM and develop machine learning models to predict 1-, 3-, and 5-year overall survival (OS). METHODS: We retrospectively analyzed patients diagnosed with PGIM from January 2000 to December 2021 in the Surveillance, Epidemiology, and End Results (SEER) database. Patients were randomly split into training and testing sets at an 8:2 ratio. Six algorithms were employed to construct survival prediction models: Cox proportional hazards, least absolute shrinkage and selection operator (LASSO) regression, Random Forest, Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), and Neural Network. Model performance and clinical utility were comprehensively evaluated using the C-index, area under the curve (AUC), Brier score, and decision curve analysis (DCA). For the best-performing model, we further utilized variable importance ranking, SHapley Additive exPlanations (SHAP) plots, individual prediction probability plots, and single-case prediction plots for interpretability, and deployed it as an online tool for clinical use. RESULTS: A total of 1,060 patients were included, with 845 in the training set and 215 in the testing set. Tumors were predominantly located in the rectum and anus, with a median survival of 16 months (mean: 30.3 months). The 1-, 3-, and 5-year OS rates were 61%, 30%, and 20%, respectively. Among the six algorithms, the Random Forest survival model demonstrated superior performance: in the training set, the C-index was 0.732, with AUCs of 0.813, 0.808, and 0.840 for 1-, 3-, and 5-year OS, respectively, and Brier scores of 0.175, 0.164, and 0.130. DCA confirmed high clinical net benefit. In the testing set, the model remained robust. Model analysis identified clinical stage, age, and surgical treatment as the three most critical prognostic factors. CONCLUSIONS: The Random Forest survival model excels in predicting OS for patients with PGIM, demonstrating strong generalizability and clinical applicability, making it a valuable tool for clinical decision-making.