Abstract
BACKGROUND: Preterm birth (PTB) occurs in approximately 11% of all births worldwide, resulting in significant morbidity and mortality for both mothers and their offspring. Identifying pregnancies at risk of preterm birth during early pregnancy may help improve interventions and reduce its incidence. Plasma cell-free DNA (cfDNA), derived from placenta and other maternal tissues, serves as a dynamic indicator of biological processes and pathological changes in pregnancy. These properties establish cfDNA as a valuable biomarker for investigating pregnancy complications, including PTB. METHODS AND FINDINGS: To date, there are few methods available for PTB prediction that have been developed with large sample sizes, high-throughput screening, and validated in independent cohorts. To address this gap, we established a large-scale, multi-center case-control study involving 2,590 pregnancies (2,072 full-term and 518 preterm) from three independent hospitals to develop a spontaneous preterm birth classifier. We performed whole-genome sequencing on cfDNA, focusing on promoter profiling (read depth of promoter regions spanning from -1 to +1 kb around transcriptional start sites). Using four machine learning models and two feature selection algorithms, we developed classifiers for predicting preterm birth. Among these, the classifier based on the support vector machine model, named PTerm (Promoter profiling classifier for preterm prediction), exhibited the highest area under the curve (AUC) value of 0.878 (0.852-0.904) following leave-one-out cross-validation. Additionally, PTerm exhibited strong performance in three independent validation cohorts, achieving an overall AUC of 0.849 (0.831-0.866). CONCLUSIONS: In summary, PTerm demonstrated high accuracy in predicting preterm birth. Additionally, it can be utilized with current non-invasive prenatal test data without changing its procedures or increasing detection cost, making it easily adaptable for preclinical tests.