Interpretable machine learning analysis for relationships between Helicobacter pylori infection and peripheral atherosclerosis: a retrospective cohort study

利用可解释的机器学习分析幽门螺杆菌感染与外周动脉粥样硬化之间的关系:一项回顾性队列研究

阅读:2

Abstract

BACKGROUND: Helicobacter pylori (H.pylori) has been implicated in peripheral atherosclerosis (PA); however, its predictive value for PA risk in large population-based cohorts remains insufficiently characterized. OBJECTIVES: This study aimed to evaluate assess the predictive contribution of H.pylori infection to PA risk, in combination with traditional clinical factors, using interpretable machine learning (ML) models. METHODS: A retrospective cohort of 5,862 individuals undergoing routine health check-ups was analyzed. Demographic data, laboratory indices, and lower-extremity vascular ultrasound findings were collected to determine PA status and H.pylori infection. Key risk factors were identified through univariate and multivariate logistic regressions. Subgroup and restricted cubic spline (RCS) analyses were applied to evaluate effect modification and nonlinear associations. Fourteen ML algorithms were developed and evaluated using the area-under-curve (AUC), sensitivity, accuracy, specificity, positive-predictive-value (PPV), negative-predictive-value (NPV), F1-score, Youden's index, and calibration. Models were subsequently retrained using regression-identified variables to assess stability. SHapley Additive exPlanations (SHAP) analysis was employed to interpret feature importance from across models. Longitudinal and mediation analyses were conducted to explore temporal relationships and the potential mediating role of the triglyceride-glucose (TyG) index. RESULTS: H.pylori infection was independently associated with PA (odds ratio (OR)=5.27, 95%CI=4.27-6.54, P<0.001), alongside male (OR = 5.88, 95%CI=4.22-8.29, P<0.001), smoking (OR = 2.11, 95%CI=1.67-2.67, P<0.001), and elevated low-density lipoprotein levels (OR = 2.10, 95%CI=1.47-3.04, P<0.001). Subgroup and RCS analyses demonstrated consistent and nonlinear associations with PA outcomes. ML models using all variables achieved excellent predictive performance, including AUC = 0.993, accuracy=0.981, sensitivity=0.954, specificity=0.990, PPV = 0.966, NPV = 0.986, F1-score=0.960 and Youden's index=0.944. After restricting predictors to regression-identified variables, the CatBoost model maintained acceptable discrimination, with an AUC = 0.754, accuracy=0.785, sensitivity=0.342, specificity=0.925, PPV = 0.621, NPV = 0.823, F1-score =0.441 and Youden's index=0.279. SHAP analysis consistently ranked H.pylori infection as the top predictor. Longitudinal analysis revealed a higher proportion of persistent H.pylori infection in emerging PA cases. Mediation analysis indicated a negligible indirect effect of the TyG index (1.11%, P = 0.124). CONCLUSIONS: H.pylori infection is independently associated with PA and represents a critical contributor to PA risk stratification. The integration of epidemiological analysis with interpretable ML provides a robust framework for identifying high-risk individuals and supports the potential value of incorporating infectious markers into vascular risk assessment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。