Abstract
BACKGROUND: Nasopharyngeal carcinoma (NPC) remains highly sensitive to radiotherapy; however, radioresistance in a subset of patients leads to local recurrence and distant metastasis. Serum proteomics provides a minimally invasive approach to capturing dynamic physiological changes, and machine learning enables efficient construction of predictive models. This study aimed to develop and validate a serum proteomics–based machine-learning model for predicting radiotherapy sensitivity in nasopharyngeal carcinoma (NPC). METHODS: Pretreatment serum samples from newly diagnosed NPC patients were analyzed using SELDI-TOF-MS. Differentially expressed proteins between radiosensitive and radioresistant groups were identified using limma. GO and KEGG analyses were performed to explore functional enrichment. Twelve machine-learning algorithms were used to construct predictive models, and the top-performing models were optimized through feature selection. A Random Forest model with seven features was identified as the optimal model. External validation was performed using an independent cohort with ELISA-quantified protein levels. Model performance was assessed using Receiver operating characteristic curve (ROC), calibration analysis, decision curve analysis (DCA), and 10-fold cross-validation. SHapley Additive exPlanations (SHAP) analysis was applied for model interpretability, and the final model was deployed via a ShinyAPP. RESULTS: A total of 96 differentially expressed proteins were identified, which involved multiple function and signaling pathways. The Random Forest model demonstrated the best predictive performance, achieving an area under the curve (AUC) of 0.963 in the training set and 0.975 in the validation set. Cross-validation yielded an average AUC of 0.965. DCA indicated high clinical utility across a broad threshold range, and calibration curves showed good model agreement. Seven proteins (PLXND1, GSR, PGD, PTPRC, OR2T29, ACTG2, CHAD) were selected as final features. SHAP analysis provided global and individual-level interpretability. A web-based tool was developed to facilitate clinical application. CONCLUSION: This study establishes a robust serum proteomics–based machine-learning model capable of accurately predicting radiotherapy sensitivity in NPC. The model offers clinical interpretability and practical implementation, supporting personalized radiotherapy decision-making. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-026-07847-2.