Abstract
OBJECTIVES: It remains debated whether gestational hypertension (GH) and pre-eclampsia (PE) are distinct entities or different spectra of the same disease. Currently, comparative studies of risk factors for GH and PE in the same population are limited. This study aims to identify the key differential risk factors between GH and PE, and develop a machine learning (ML)-based PE prediction model to address the clinical needs of low-resource settings. METHODS: A single-center retrospective analysis included 1,157 pregnant women who established prenatal care records and delivered at Hangzhou Women's Hospital between 2019 and 2022. Five ML models-adaptive boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), random forest (RF), logistic regression (LR), and gradient boosting (GB)-were developed and compared. The optimal prediction model and important features were identified, and the data set was divided into a training set (70%) and a validation set (30%) for internal validation. In addition, external data were used to further validate the predictive performance of the model, with simultaneous performance comparison against the LR model. RESULTS: The XGBoost model demonstrated optimal performance (training AUC: 0.930; validation AUC: 0.763; testing AUC: 0.843). Thyroid-stimulating hormone (TSH), age, mean corpuscular volume (MCV), triglycerides (TG), D-dimer, albumin (ALB), and uric acid (UA) were the important features, among which TSH had the strongest influence on the predictive model. Mean SHapley Additive exPlanations (SHAP) value provided model interpretability. CONCLUSIONS: The XGBoost-based model effectively predicts the risk of PE occurrence, showing acceptable performance and interpretability, with its overall performance being superior to that of the traditional LR method.