Abstract
BACKGROUND: Accurate assessment of preterm birth (PTB) risk allows for early symptom management to prolong gestation or prevent PTB, as well as enhanced perinatal survival and treatment outcomes. We aimed to develop a prognostic model for predicting PTB using a machine learning (ML) algorithm. METHODS: This retrospective cohort study relies on birth records of women who delivered at Khaleej-e-Fars Hospital, sourced from the “Iranian Maternal and Neonatal Network (IMaN Net),” an official national database, covering the period from January 1, 2020, to January 1, 2022. The main outcome of the study was PTB (defined as births before 37 weeks of gestation). Only spontaneous PTB were included. The predictors were classified as (1) socio-demographic factors, (2) maternal chronic medical conditions and past obstetric history, and (3) current pregnancy characteristics and complications. We used different distinct supervised ML techniques to detect PTB. AUC-ROC (Area Under the Receiver Operating Characteristic), accuracy, precision, and recall were used to evaluate the performance parameters of ML models. RESULTS: Of the 8853 births throughout the study period, 1230 (13.9%) were PTB cases. Based on the Chi-square test, PTB was associated with nationality, advanced maternal age, a history of previous PTB, threatened miscarriage during the current pregnancy, intrauterine growth retardation, maternal comorbidities such as chronic hypertension, pre-eclampsia, urinary tract infection, and substances use. The AUC-ROC for each model ranged from 0.57 to 0.65, with deep learning-feed forward showing the highest AUC (0.65). Values between 0.57 and 0.65 indicate poor effectiveness in predicting PTB for all examined models. Further analysis showed the accuracy of each model as follows: random forest classification (0.65), decision tree classification (0.64), XGBoost classification (0.63), deep learning-feed forward (0.61), light gradient-boosting (0.60), permutation feature classification with KNN (0.58), linear regression (0.57), and logistic regression (0.57). CONCLUSIONS: Using a combination of a clinical database and various ML algorithms to evaluate the accuracy of different models in predicting PTB was not enough to show acceptable predicting power. Additional future research is required to improve the accuracy of predictions.