Abstract
OBJECTIVE: To assess the diagnostic performance of serum prostate-specific antigen (PSA), the Prostate Health Index (PHI), and peripheral blood inflammatory markers (neutrophil-lymphocyte ratio (NLR), lymphocyte-monocyte ratio (LMR), neutrophil-apolipoprotein A1 ratio (NAR) apolipoprotein A1 (ApoA1)) in differentiating prostate cancer (PCa) from biopsy-negative benign prostatic hyperplasia (BPH), and to construct an optimized machine learning diagnostic model. METHODS: A retrospective analysis was conducted on 701 patients referred for prostate biopsy between March 2018 and January 2024, including 421 PCa and 280 BPH cases. Patients were divided into training (60%; n=421), validation (20%; n=140), and test (20%; n=140) cohorts. LASSO regression identified key predictors, which were used to develop five machine learning models-logistic regression, decision tree, random forest, support vector machine, and XGBoost. model performance was evaluated using ROC and precision-recall curves, calibration plots, Brier Scores, and decision curve analysis (DCA). AUCs were compared using the DeLong test. RESULTS: PCa patients exhibited higher PSA, Neu, MONO, NLR, NAR, and PHI but lower ApoA1 and LMR than BPH patients (all P<0.05). XGBoost achieved the best performance (AUC: training 0.994; validation 0.953; test 0.979), significantly surpassing PSA (AUC difference: 0.055-0.118, P<0.001) and PHI (AUC difference: 0.077-0.084, P<0.007). Calibration curves indicated low Brier Scores (0.0326-0.0751) and excellent model fit. DCA confirmed superior clinical benefit. NLR and NAR were major contributors to PCa risk prediction. CONCLUSIONS: The XGBoost model integrating NLR, LMR, and NAR demonstrates superior diagnostic accuracy and clinical utility compared with PSA and PHI, potentially improving pre-biopsy risk stratification and reducing unnecessary invasive procedures.