Abstract
Prediction of small (SGA) and large for gestational age (LGA) using routinely collected antenatal data remains suboptimal, particularly among nulliparous women. In this study, models for SGA (< 10(th) percentile) and LGA (> 90(th) percentile) were developed by combining grandmaternal pregnancy-related information and maternal birth characteristics ("G0 predictors") with maternal clinical factors available at 26 weeks' gestation ("G1 predictors"). The study used a cohort of first-born, singleton births to nulliparous women in Nova Scotia, Canada (1981-2011), and their mothers, from the Nova Scotia Atlee Perinatal Database. Models using G0 predictors, G1 predictors, and their combination were developed with Super Learner, an ensemble machine learning algorithm, and internally validated using nested cross-validation. Discrimination was assessed via the area under the receiver operating characteristic curve (AUC-ROC) and the precision-recall curve (AUC-PR); calibration was also evaluated. Among 9,097 grandmother-mother-infant triads, 902 (9.9%) infants were SGA and 891 (9.8%) were LGA. Including G0 predictors improved discrimination compared to G1-only models (AUC-ROC 0.69 vs. 0.66 for SGA and 0.71 vs. 0.66 for LGA; AUC-PR: 0.21 vs. 0.18 for SGA and 0.22 vs. 0.18 for LGA). Models fitted using both sets of predictors were well calibrated. While incorporating intergenerational information modestly improved prediction, overall predictive performance remains poor.