Abstract
Preeclampsia (PE) is a disease that seriously threatens the health of pregnant women, and early intervention significantly reduce its incidence in high-risk mothers. To identify PE high-risk mother with a certain level of confidence or higher, we aimed to develop a framework for early prediction of preeclampsia (PE), which provides risk scores along with its associated uncertainty score due to missing data in clinical datasets. We built a machine learning model using a multi-center retrospective clinical dataset of 31,235 singleton pregnancies. We assessed the contribution of each variable to prediction variability using Shapley Additive Explanation (SHAP) values in order to quantify uncertainty score resulting from missing data. The score for each sample was calculated by summing the contributions of missing variables. Predictive performance was evaluated using samples with uncertainty scores below specific thresholds, with validation conducted via internal validation and external validation on an independent cohort. Internal validation revealed a strong inverse correlation between uncertainty score thresholds and AUROC (Spearman correlation coefficient: -0.999). At the threshold of 0.11 of the minimum possible level, the AUROC reached 0.978, compared to 0.845 when uncertainty was not considered. In external validation, the AUROC reached 0.994 at the same threshold, compared to 0.693 when uncertainty was not considered. Our framework demonstrated high predictive performance in low-uncertainty samples, emphasizing its stability and effectiveness. This approach reduces the risk of overconfidence in high-uncertainty predictions and represents a more reliable method for PE prediction.