Abstract
OBJECTIVE: To construct and validate a clinical model to predict painful diabetic peripheral neuropathy (PDPN) risk in type 2 diabetes mellitus (T2DM) patients for early identification and intervention in primary care. METHODS: A total of 1,984 patients with T2DM were included in the analysis. After data preprocessing and application of the Synthetic Minority Oversampling Technique (SMOTE) with a 200% oversampling ratio, feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. Six predictive models: multivariable logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), artificial neural network (ANN), and support vector machine (SVM)-were developed and tuned using repeated 5-fold cross-validation. Model performance was evaluated on the independent test cohort using comprehensive discrimination and calibration metrics. To enhance clinical interpretability, a nomogram and SHapley Additive exPlanations (SHAP) analysis were implemented to visualize predictor contributions. RESULTS: Ten variables were selected as predictors. Among 1,984 patients, 81 (4.08%) had PDPN. LR model demonstrated the most favorable trade-off for screening purposes, with an area under the receiver operating characteristic curve (AUC-ROC) of 0.894 (95% CI: 0.814-0.964), area under the precision-recall curve (PR-AUC) of 0.470 (95% CI: 0.258-0.665), and balanced accuracy of 0.826 (95% CI: 0.667-0.932). SHAP analysis identified musculoskeletal disorders and HbA1c as the most influential predictors. A user-friendly dynamic web-based nomogram was constructed to support clinical implementation. CONCLUSION: We established and validated a clinically interpretable model for PDPN risk prediction in patients with T2DM. The dynamic nomogram enables individualized risk estimation and may assist timely intervention in routine practice.