Abstract
OBJECTIVE: This study aimed to develop and validate a machine learning-enhanced screening questionnaire utilizing gradient boosting algorithms and to establish a clinically deployable visual prediction framework with superior diagnostic accuracy compared to existing screening paradigms. METHODS: We conducted a mixed-methods study analyzing polysomnography data from 4,036 participants between September 2019 and August 2025. The study included a retrospective cohort of 3,847 participants and a prospective cohort of 189 participants. We developed a 15-item questionnaire combining components from the modified Epworth Sleepiness Scale (ESS) and Snoring, Tiredness, Observed apneas, Blood pressure, Age, Neck circumference, and Gender (STOP-Bang) items. We evaluated four machine learning algorithms: XGBoost, support vector machine (SVM), artificial neural network (ANN), and multinomial logistic regression model. Performance was measured using the area under the curve (AUC), net reclassification improvement (NRI), calibration metrics, and decision curve analysis, while propensity score matching (1:4 ratio) addressed potential confounding factors. RESULTS: XGBoost outperformed traditional screening tools, achieving AUC values of 0.92, 0.94, and 0.97 for mild, moderate, and severe obstructive sleep apnea (OSA), respectively, compared to the STOP-Bang questionnaire (AUC: 0.68) and the Berlin questionnaire (AUC: 0.72). The clinical nomogram exhibited excellent calibration characteristics with a C-index of 0.93. SHapley Additive exPlanations (SHAP) analysis identified neck circumference as the primary predictive feature (mean |SHAP| = 0.42), followed by body mass index (0.38) and witnessed apneas (0.35). Economic analysis revealed a 39.7% reduction in screening costs with a 3.5-fold increase in case detection efficiency. CONCLUSION: The gradient boosting-enhanced OSA screening model represents a paradigmatic advancement in the diagnosis of sleep disorders, offering clinically actionable risk stratification through interpretable visualization while maintaining implementation feasibility. This methodological innovation provides a framework for artificial intelligence integration in clinical decision support, with potential applications extending beyond sleep medicine.