Abstract
BACKGROUND: Helicobacter pylori (H.pylori) has been implicated in peripheral atherosclerosis (PA); however, its predictive value for PA risk in large population-based cohorts remains insufficiently characterized. OBJECTIVES: This study aimed to evaluate assess the predictive contribution of H.pylori infection to PA risk, in combination with traditional clinical factors, using interpretable machine learning (ML) models. METHODS: A retrospective cohort of 5,862 individuals undergoing routine health check-ups was analyzed. Demographic data, laboratory indices, and lower-extremity vascular ultrasound findings were collected to determine PA status and H.pylori infection. Key risk factors were identified through univariate and multivariate logistic regressions. Subgroup and restricted cubic spline (RCS) analyses were applied to evaluate effect modification and nonlinear associations. Fourteen ML algorithms were developed and evaluated using the area-under-curve (AUC), sensitivity, accuracy, specificity, positive-predictive-value (PPV), negative-predictive-value (NPV), F1-score, Youden's index, and calibration. Models were subsequently retrained using regression-identified variables to assess stability. SHapley Additive exPlanations (SHAP) analysis was employed to interpret feature importance from across models. Longitudinal and mediation analyses were conducted to explore temporal relationships and the potential mediating role of the triglyceride-glucose (TyG) index. RESULTS: H.pylori infection was independently associated with PA (odds ratio (OR)=5.27, 95%CI=4.27-6.54, P<0.001), alongside male (OR = 5.88, 95%CI=4.22-8.29, P<0.001), smoking (OR = 2.11, 95%CI=1.67-2.67, P<0.001), and elevated low-density lipoprotein levels (OR = 2.10, 95%CI=1.47-3.04, P<0.001). Subgroup and RCS analyses demonstrated consistent and nonlinear associations with PA outcomes. ML models using all variables achieved excellent predictive performance, including AUC = 0.993, accuracy=0.981, sensitivity=0.954, specificity=0.990, PPV = 0.966, NPV = 0.986, F1-score=0.960 and Youden's index=0.944. After restricting predictors to regression-identified variables, the CatBoost model maintained acceptable discrimination, with an AUC = 0.754, accuracy=0.785, sensitivity=0.342, specificity=0.925, PPV = 0.621, NPV = 0.823, F1-score =0.441 and Youden's index=0.279. SHAP analysis consistently ranked H.pylori infection as the top predictor. Longitudinal analysis revealed a higher proportion of persistent H.pylori infection in emerging PA cases. Mediation analysis indicated a negligible indirect effect of the TyG index (1.11%, P = 0.124). CONCLUSIONS: H.pylori infection is independently associated with PA and represents a critical contributor to PA risk stratification. The integration of epidemiological analysis with interpretable ML provides a robust framework for identifying high-risk individuals and supports the potential value of incorporating infectious markers into vascular risk assessment.