Abstract
PURPOSE: Objectives were to develop a machine learning (ML) model based on electronic health record (EHR) data to predict the risk of vomiting within a 96-hour window after admission to the pediatric oncology and hematopoietic cell transplant (HCT) services using retrospective data and to evaluate the model prospectively in a silent trial. PATIENTS AND METHODS: Admissions between 2018-06-02 to 2024-02-17 (retrospective) and 2024-05-09 to 2024-08-05 (prospective) to the oncology or HCT services were included. Data source was SEDAR, a curated and validated approach to deliver EHR data for ML. Prediction time was 08:30 the morning following admission. The outcome was any vomiting within 96 h following prediction time. We trained models using L2-regularized logistic regression, LightGBM and XGBoost. Training cohorts include the target cohort and all inpatient admissions. RESULTS: There were 7,408 admissions in the retrospective phase and 340 admissions in the prospective silent trial phase. The best-performing model in the retrospective phase was the LightGBM model trained on all inpatients. The number of features in the final model was 2,859. The area-under-the-receiver-operating-characteristic curve (AUROC) was 0.730 (95% confidence interval (CI) 0.694-0.765) for the retrospective phase and 0.716 (95% CI 0.649-0.784) for the prospective silent trial phase. CONCLUSIONS: We found that data in the EHR could be used to develop a retrospective ML model to predict vomiting among pediatric oncology and HCT inpatients. This model retained satisfactory performance in a prospective silent trial. Future plans will include deployment into clinical workflows and determining if the model improves vomiting control.