Research on key indicators for diagnosis and prediction of rheumatoid arthritis based on GBDT+LR embedded feature selection model

基于GBDT+LR嵌入式特征选择模型的类风湿关节炎诊断和预测关键指标研究

阅读:1

Abstract

BACKGROUND: Rheumatoid arthritis (RA) exhibits substantial diagnostic overlap with other autoimmune diseases that share similar pathological features, leading to redundant testing and limited diagnostic specificity. Therefore, there is an urgent need to identify critical clinical indicators with high diagnostic and predictive value to improve both diagnostic efficiency and accuracy. METHODS: To address this challenge, we propose a multidimensional embedded feature selection framework based on ensemble learning. This framework integrates Gradient Boosted Decision Trees (GBDT) and Logistic Regression (LR) models to extract potential diagnostic features from multi-source clinical datasets. GBDT captures complex nonlinear interactions among features, enhancing adaptability to heterogeneous data, while LR leverages its sparsity-promoting characteristics to perform dimensionality reduction and highlight discriminative variables. To further improve interpretability, the SHapley Additive exPlanations (SHAP) algorithm was employed to quantify the contribution of each feature to the model's predictions and to identify novel diagnostic markers beyond traditional indicators. RESULTS: Validated on real-world clinical data, the proposed framework achieved excellent diagnostic performance across multiple evaluation metrics, significantly enhancing the specificity and accuracy of RA diagnosis. Compared with conventional diagnostic methods, our model demonstrated marked improvements in test accuracy and area under the receiver operating characteristic curve (AUC). SHAP not only reaffirmed the importance of RF and anti-CCP but revealed that systemic metabolic indicators-such as low HDL, elevated bile acids, and altered creatinine-carry independent diagnostic weight. This supports a paradigm shift toward viewing RA as a multi-system inflammatory disorder, enabling earlier clinical suspicion even before classic articular manifestations. CONCLUSION: The proposed multidimensional embedded feature selection framework showed strong diagnostic performance and interpretability in identifying key biomarkers for RA, effectively addressing the issue of indicator redundancy and enhancing diagnostic precision. This pragmatic application of an established GBDT+LR framework, integrated with SHAP for interpretability and built on routine clinical data, offers potential clinical utility in RA diagnosis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。