Abstract
BACKGROUND: Fetal hypoxia is a leading cause of neonatal morbidity and mortality. Cardiotocography (CTG) is widely used to predict fetal hypoxia during labor, but its interpretation remains suboptimal. Artificial intelligence (AI) models have been developed for CTG interpretation, but their clinical utility is limited by two major challenges: demonstrating superiority over human experts and ensuring explainability in real-world settings. METHODS: A large dataset containing CTG traces from three tertiary hospitals between January 2014 and May 2022 was built for model development. Deep learning architectures, named Cardiotocography Artificial-intelligence Predictors (CAPs), were trained to predict fetal hypoxia from CTG traces based on CNN (CAP-C), Transformer (CAP-T), LSTM (CAP-L), and CfC (CAP-CfC) algorithms. The outcome was fetal hypoxia, determined by either low Apgar score (≤ 7 at 1 or 5 min) or umbilical artery acidemia (grade 1: pH of umbilical artery (pHa) < 7.20; grade 2: pHa < 7.15; grade 3: pHa < 7.10). Model performance was determined by area under the receiver operating characteristic curve (AUROC), evaluated through nationwide AI-human comparison and validated on the CTU-UHB dataset. Gradient-weighted class activation mapping (Grad-CAM) was applied to highlight the CTG regions that contributed most to the model's predictions. RESULTS: A total of 20,780 CTG traces were obtained for model development, and 467 cases were held out for the nationwide AI-human comparison. Among all models, CAP-L achieved highest AUROC in predicting fetal hypoxia (grade 1: 0.758, 95% CI: 0.754-0.761; grade 2: 0.770, 95% CI: 0.764-0.776; grade 3: 0.716, 95% CI: 0.700-0.732). In comparison with 10,571 expert responses, all CAP models achieved higher AUROC (0.757-0.789 vs. 0.715, P values in Delong test < 0.05). On the public CTU-UHB dataset, CAP-L achieved AUROC of 0.709, 0.727, and 0.730 in predicting fetal hypoxia with grade 1, 2, and 3 acidemia. Grad-CAM analysis showed that the CAP models leveraged variable and prolonged decelerations to predict fetal hypoxia, verified by perturbation-based faithfulness test. CONCLUSIONS: The CAP algorithms developed in this study showed superior performance in detecting fetal hypoxia from CTG traces compared to human experts, and demonstrated promising explainability, supporting clinical CTG interpretation. TRIAL REGISTRATION: Clinical trial registration number: ChiCTR2100045316, ChiCTR2100052695, ChiCTR2400085338.