Abstract
BACKGROUND: The purpose of this study is to use a variety of machine learning (ML) algorithms to build a risk prediction model for nursing students' social anxiety, select the optimal model, and identify risk factors. METHODS: The cross-sectional survey was conducted among nursing students at 10 universities from September to December 2024. A total of 2024 nursing students were included in this study. Nine acceptable features were selected through Logistic analysis. We developed and evaluated seven ML models: Logistic regression (LR), Elastic net (EN), k-nearest neighbors (KNN), Decision tree (DT), Extreme gradient boosting (XGBoost), Support vector machine (SVM), Random forest (RF). RESULTS: The area under the Area Under Curve (AUC: 0.71) of the random forest model was the highest among the 7 models that predicted nursing students' social anxiety. The most important characteristics that predicted social anxiety in nursing students included Sleep condition, alexithymia, depression, education level, and religious belief. CONCLUSION: Our findings suggest that ML models, specifically random forests, can best predict the risk of social anxiety among nursing students.