Abstract
BACKGROUND: Most clinical prediction models for assisted reproductive technology focus primarily on female ovarian reserve markers and often under-represent male factors and the metabolic status of both partners. Additionally, traditional parametric models may have limited ability to capture nonlinear patterns within reproductive data. This study aimed to develop and validate a machine learning (ML)-based model to predict clinical pregnancy outcomes in couples with male factor infertility undergoing IVF/ICSI, and to explore model interpretability using Shapley Additive exPlanations (SHAP). METHODS: This retrospective study analyzed 2,565 couples undergoing their first IVF/ICSI cycle for male factor infertility at Shanghai First Maternity and Infant Hospital between 2019 and 2025. The cohort was partitioned according to embryo transfer date, with the first 70% of cases assigned to the training set and the remaining 30% reserved as an temporal internal validation set. Feature selection was conducted using LASSO regression within the training set. Seven ML models, including LightGBM and Logistic Regression, were developed and optimized through 5-fold cross-validation. Model performance was evaluated using the area under the curve (AUC), accuracy, Brier score, and decision curve analysis. SHAP was employed to provide a visual interpretation of the optimal model. RESULTS: Five predictors were selected in the training set: female BMI, male BMI, basal FSH, AMH, and female age. In the temporal validation set, all models demonstrated comparable discriminative performance (AUC range: 0.840-0.857). LightGBM achieved an AUC of 0.857 (95% CI: 0.830-0.882), with an accuracy of 0.775 and specificity of 0.909. DeLong tests indicated no statistically significant differences in AUC between LightGBM and Random Forest (P = 0.918), XGBoost (P = 0.985), or logistic regression (P = 0.067). Based on its overall stability across discrimination, calibration (Brier score = 0.145), and clinical utility, LightGBM was selected for interpretability analysis. CONCLUSIONS: A LightGBM-based prediction model demonstrated reasonable performance for predicting IVF/ICSI outcomes in couples with male factor infertility. Within this dataset, couple-level metabolic features were strongly associated with model predictions alongside traditional ovarian reserve markers. These findings reflect predictive associations rather than causal effects and suggest that metabolic characteristics may warrant consideration in risk stratification and counseling. Prospective studies are needed to determine whether targeted interventions can improve clinical outcomes.