Abstract
BACKGROUND: We aim to develop a predictive model to identify children with gastrointestinal congenital malformations at high risk of 30-day mortality following intervention or hospital admission. METHODS: The data used for our analysis was collected as part of the Global PaedSurg research collaboration, and includes 3849 patients from 74 countries. Data preprocessing, missing data imputation, oversampling of the non-survivor class, and random undersampling of the survivor class were performed prior to training a random forest classifier for mortality prediction. RESULTS: The overall 30-day mortality in our model dataset is 19.5%. The model displays an overall accuracy of 88.84% (CI: 86.79%, 90.66%), with strong precision (84.13%, CI: 78.4%, 88.8%) and sensitivity (89.98%, CI: 87.8%, 91.9%) in identifying non-survivors. The area under the curve (AUC) is 0.941 (CI: 0.924, 0.957) for subjects in the non-survival class. The most important features in the classifier are the diagnosis of a complication, the duration of postoperative antibiotic treatment, the need for parenteral nutrition or ventilation, the American Society of Anesthesiologists (ASA) score, weight upon admission, and the use of a surgical safety checklist. CONCLUSIONS: Random forest classifier is a viable option for short-term mortality prediction for children with gastrointestinal congenital malformations. IMPACT: To our knowledge, this is the first global study to apply machine learning for mortality prediction in children with gastrointestinal malformations, using data from 3849 patients across 74 countries. The random forest model achieves 88.84% (CI: 86.79%, 90.66%) accuracy, with strong precision (84.13%, CI: 78.4%, 88.8%) and sensitivity (89.98%, CI: 87.8%, 91.9%) in identifying at-risk patients. Key predictors include clinical factors (diagnosis of complications, American Society of Anaesthesiologists score, weight on admission, duration of post-operative antibiotic treatment), procedural elements (surgical checklist), and socio-demographic variables (continent, income level).