Abstract
PURPOSE: Can machine learning effectively predict the number of mature oocytes retrieved after controlled ovarian stimulation in patients undergoing in vitro fertilization or intracytoplasmic sperm injection? DESIGN: This retrospective study included data from 24,976 infertility patients treated between 2018 and 2022 at a single reproductive center. After data preprocessing, feature selection was performed using correlation analysis, LASSO regression, and recursive elimination based on random forest. Eight machine learning models were developed to predict the number of mature oocytes retrieved. Model performance was evaluated using root mean square error (RMSE), mean absolute error (MAE), and R-squared (R(2)). Calibration was assessed using intercept and slope on the testing set. Clinical validation was conducted using an independent 2022 dataset. The best-performing model was interpreted using SHAP values. A web-based calculator was developed to support clinical application. RESULTS: Six predictors were identified: antral follicle count; follicle-stimulating hormone and estradiol at stimulation start; estradiol on the trigger day; number of large follicles on the trigger day; and age. The multilayer perceptron model achieved the highest accuracy, with RMSE of 3.675, MAE of 2.702, and R(2) of 0.714 in clinical validation. Calibration analysis showed good agreement between predicted and observed values. SHAP interpretation revealed estradiol level and number of large follicles on the trigger day as the strongest predictors. CONCLUSIONS: We developed a machine learning model that accurately predicts mature oocyte yield. A web-based calculator was also created to enable individualized prediction and enhance clinical usability.