Identifying gastric intestinal metaplasia risk based on clinical indicators: a machine learning predictive model based on the SHAP methodology

基于临床指标识别胃肠化生风险:基于SHAP方法的机器学习预测模型

阅读:1

Abstract

BACKGROUND: Screening for gastric intestinal metaplasia (GIM) holds significant importance for the early detection of gastric cancer. To help clinicians identify high-risk GIM patients and determine the timing of gastric mucosal biopsy, we aim to develop a predictive model for the occurrence of GIM in patients. METHODS: Patients were collected from the First Affiliated Hospital of Dalian Medical University, following rigorous inclusion and exclusion criteria. Initially, the VarSelRF algorithm identified independent variables linked to GIM development. We employed eight machine learning algorithms, including Decision Trees (DT), Elastic Net (ENet), K-Nearest Neighbors (KNN), LightGBM, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) to construct predictive models. Their performances were benchmarked using ROC curves, calibration curves, and decision curve analysis (DCA) curves. We also applied SHAP values to interpret the RF model, quantifying the contribution of each feature to predictions. Additionally, a web-based calculator was developed based on the RF model to facilitate practical clinical applications. RESULTS: Among the 975 patients examined, 322 individuals were pathologically confirmed to have GIM. Eleven independent variables significantly contributed to GIM occurrence, including gastric mucosal atrophy, H. pylori infection, direct bilirubin (DBIL), creatinine (Crea), smoking and alcohol history, gender, alanine aminotransferase (ALT), age, albumin/globulin ratio (ALB/GLO), and gamma-glutamyltransferase (GGT). The RF model demonstrated strong performance among the eight machine learning algorithms tested, achieving an AUC of 0.8167 in the testing dataset, along with a specificity of 85.5% and a sensitivity of 57.0%. The model's interpretive capabilities were enhanced by SHAP values, which helped clinicians understand the decision-making process. The resulting web-based calculator serves as a practical tool for clinicians. CONCLUSION: This study highlights the innovative use of serological biomarkers to assess the risk of GIM. We found that certain markers related to liver and kidney function are strong predictors of GIM development. Additionally, the application of SHAP values improves the understanding of how features contribute to predictions, while the newly developed web-based calculator offers a practical tool for clinicians to evaluate GIM risk more easily.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。