Disease classification via interpretable machine learning based on multi-center routine coagulation test

基于多中心常规凝血试验的可解释机器学习疾病分类

阅读:3

Abstract

BACKGROUND: This study aims to establish an interpretable disease classification model via machine learning and identify key features related to the disease to assist clinical disease diagnosis based on a multi-center CX9000 routine coagulation test. METHODS: Data from 11 hospitals were collected. An unsupervised clustering model was used to extract classification patterns, and clinical experts assigned disease labels. Multiple machine learning models, including Random Forest, SVM, Decision Tree, Naive Bayes, MLP, XGBoost, and LightGBM, were trained. Ten-fold cross validation and external validation were performed. For external validation, models were trained with data from 8 hospitals (˜90%) and tested on the remaining 2 hospitals (˜10%). SHAP and Decision Tree analysis were used for interpretability. RESULTS: Clear clustering patterns were observed for valvular heart disease (VHD) and pulmonary infection (PI). LightGBM achieved the best performance in both tasks. In cross validation, the mean F1-scores were 0.8890 and 0.7233, and the mean AUCs were 0.9500 and 0.8023. External validation showed strong generalization, with mean F1-scores of 0.9259 and 0.7464 and mean AUCs of 0.9493 and 0.8297. The sample visualization by t-SNE and the interpretable analysis by SHAP and Decision Trees identified some key classification features, i.e., international normalized ratio (INR) for VHD and age for PI. CONCLUSION: Machine learning models based on multi-center coagulation tests provide effective and interpretable disease classification, supporting clinical diagnostic automation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。