Abstract
OBJECTIVE: This study aims to develop a machine learning model for predicting early pregnancy outcomes by combining baseline levels and dynamic changes of β-human chorionic gonadotropin (β-hCG), progesterone (P), and estradiol (E2). METHODS: This retrospective study screened out 421 patients treated at the Lanzhou University Second Hospital between March 2023 and August 2024. Feature selection was performed using Least Absolute Shrinkage and Selection Operator (LASSO) and Random Forest Recursive Feature Elimination (RF-RFE). Subsequently, we constructed a traditional logistic regression model and five machine learning models: Random Forest (RF), eXtreme Gradient Boosting (XGBoost), k-Nearest Neighbors (KNN), Multilayer Perceptron (MLP) neural network, and Support Vector Machine (SVM). Internal validity was assessed through 5-fold cross-validation. Model performance was measured by the area under the Receiver Operating Characteristic curve (AUC), accuracy, precision, sensitivity, and specificity. RESULTS: Among the 421 enrolled patients, 263 had ongoing pregnancies while 158 experienced early pregnancy loss (EPL). LR, RF, XGBoost, KNN, MLP, and SVM achieved AUCs of 0.750, 0.784, 0.750, 0.706, 0.755, and 0.749, respectively, with all accuracy and precision metrics exceeding 0.60. Notably, the RF model yielded optimal performance for EPL prediction, attaining the highest AUC (0.784), accuracy (0.729), and precision (0.724). CONCLUSION: Integrating dynamic changes in β-hCG, P, and E2 enables effective prediction of early pregnancy outcomes. The RF model exhibited optimal performance, highlighting its potential for clinical implementation as a risk stratification tool based on serial hormone monitoring.