Abstract
OBJECTIVE: This multicentre study aimed to develop and validate a machine learning (ML) model to predict postoperative nausea and vomiting (PONV) in patients undergoing sedated gastrointestinal endoscopy. We compared multiple algorithms, applied SHAP for feature interpretability, and translated the optimized model into a web-based tool. METHODS: A total of 745 patients were prospectively enrolled from four tertiary hospitals in China, including a development cohort of 428 patients from the First Affiliated Hospital of Zhengzhou University (July-December 2023) and an external validation cohort of 317 patients from three institutions (June-August 2024). Eligible patients were aged 18-80 years with ASA I-III. Exclusions included severe cardiopulmonary comorbidities, >30% missing data, complications or withdrawal. Eleven ML algorithms were trained using demographic, clinical and procedural variables. Model performance was assessed via AUC, accuracy, precision, recall, F1-score, specificity and Cohen's kappa score. Calibration and SHAP analysis were conducted, and the final model was deployed as a Streamlit-based tool. RESULTS: This study enrolled 745 patients (428 in internal training and 317 in external validation cohorts). While the incidence of PONV showed no significant inter-cohort difference (29.0% vs. 29.6%, p = .85), notable disparities existed in weight, opioid use, examination type, anaesthesia duration, smoking history and diastolic blood pressure parameters (p < .05). Among 11 ML models evaluated, linear discriminant analysis (LDA) demonstrated superior generalizability in external validation (AUC: 0.834 [95%CI, 0.761-0.927]), outperforming logistic regression and support vector machines that achieved AUC > 0.800 in internal testing. The optimally calibrated LDA model facilitated development of a real-time risk prediction (https://p9xjczqdwwf7obxf6u7jeo.streamlit.app/), with SHAP interpretability analysis identifying prior PONV history, height, examination type and opioid usage as primary predictive determinants. CONCLUSIONS: LDA demonstrated superior generalizability and was implemented as a web-based risk prediction tool, enabling real-time PONV assessment and supporting individualized perioperative management.