Abstract
OBJECTIVE: Nasopharyngeal carcinoma (NPC) is a radiosensitive malignancy, and radiation dermatitis is a common adverse effect of radiotherapy that can substantially impair quality of life. However, reliable tools for predicting moderate-to-severe radiation dermatitis in patients with NPC remain limited. This study aimed to develop and validate a predictive model for moderate-to-severe radiation dermatitis in patients with NPC using machine learning approaches. METHODS: This retrospective study included 796 patients with NPC treated at Sun Yat-sen University Cancer Center between January 2023 and December 2024. Radiation dermatitis severity was graded according to the Common Terminology Criteria for Adverse Events version 5.0. Patients were randomly divided into training and validation sets at a ratio of 7:3. Least absolute shrinkage and selection operator regression was used for feature selection. Missing data in the training set were handled using multiple imputation by chained equations, generating five imputed datasets. To address class imbalance, the synthetic minority oversampling technique was applied within the training set during repeated 10-fold cross-validation. Eight models were developed and compared: gradient boosting machine, neural network, extreme gradient boosting, k-nearest neighbor, light gradient boosting machine, random forest, support vector machine, and logistic regression. Hyperparameters were optimized using grid search with 20-fold cross-validation. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (PR-AUC), calibration, and decision-curve analysis. Shapley additive explanations were used to interpret the final model. RESULTS: Of the 796 patients included, 596 were assigned to the training set and 200 to the validation set. Ten predictors were selected for model development. Among the eight models, logistic regression demonstrated the best overall performance and generalizability. The model achieved an AUC of 0.721 in the training set and 0.710 in the validation set, with the highest validation AUC among all candidate models. In addition, its PR-AUC, calibration, and decision-curve performance in the validation set indicated superior robustness and clinical utility, particularly in the setting of class imbalance. Model interpretation showed that lymph node volume and immunotherapy were the most influential predictors. Based on the final logistic regression model, a nomogram was constructed to support individualized risk estimation. CONCLUSIONS: A logistic regression model showed good performance for predicting moderate-to-severe radiation dermatitis in patients with NPC and may support individualized risk stratification in clinical practice. The derived nomogram may help identify high-risk patients early and inform preventive management strategies.