Abstract
BACKGROUND: Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. OBJECTIVE: This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk. METHODS: Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score. RESULTS: Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%. CONCLUSION: DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment.