Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study

胸部X光病理检测中神经原型树的性能、可解释性和公平性交集:算法开发与验证研究

阅读:1

Abstract

BACKGROUND: While deep learning classifiers have shown remarkable results in detecting chest X-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the neural prototype tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection. OBJECTIVE: This study aimed to investigate the utility of the NPT classifier in 3 dimensions, including performance, interpretability, and fairness, and subsequently examined the complex interaction between these dimensions. We highlight both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings. METHODS: This study used CXRs from the publicly available Chest X-ray 14, CheXpert, and MIMIC-CXR datasets. We trained 6 separate classifiers for each CXR pathology in all datasets, 1 baseline residual neural network (ResNet)-152, and 5 NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness were measured using the area under the receiver operating characteristic curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity. RESULTS: The performance of the NPT classifier improved as the IC level increased, surpassing that of ResNet-152 at IC level 15 for the Chest X-ray 14 dataset and IC level 31 for the CheXpert and MIMIC-CXR datasets. The NPT classifier at IC level 1 exhibited the highest degree of unfairness, as indicated by the mean TPR disparity. The magnitude of unfairness, as measured by the mean TPR disparity, was more pronounced in groups differentiated by age (chest X-ray 14 0.112, SD 0.015; CheXpert 0.097, SD 0.010; MIMIC 0.093, SD 0.017) compared to sex (chest X-ray 14 0.054 SD 0.012; CheXpert 0.062, SD 0.008; MIMIC 0.066, SD 0.013). A significant positive relationship between interpretability (ie, IC level) and performance (ie, ROC AUC) was observed across all CXR pathologies (P<.001). Furthermore, linear regression analysis revealed a significant negative relationship between interpretability and fairness (ie, mean TPR disparity) across age and sex subgroups (P<.001). CONCLUSIONS: By illuminating the intricate relationship between performance, interpretability, and fairness of the NPT classifier, this research offers insightful perspectives that could guide future developments in effective, interpretable, and equitable deep learning classifiers for CXR pathology detection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。