Visual Transformers and Convolutional Neural Networks for Disease Classification on Radiographs: A Comparison of Performance, Sample Efficiency, and Hidden Stratification

基于视觉变换和卷积神经网络的X光片疾病分类:性能、样本效率和隐层分层比较

阅读:1

Abstract

PURPOSE: To compare performance, sample efficiency, and hidden stratification of visual transformer (ViT) and convolutional neural network (CNN) architectures for diagnosis of disease on chest radiographs and extremity radiographs using transfer learning. MATERIALS AND METHODS: In this HIPAA-compliant retrospective study, the authors fine-tuned data-efficient image transformers (DeiT) ViT and CNN classification models pretrained on ImageNet using the National Institutes of Health Chest X-ray 14 dataset (112 120 images) and MURA dataset (14 656 images) for thoracic disease and extremity abnormalities, respectively. Performance was assessed on internal test sets and 75 000 external chest radiographs (three datasets). The primary comparison was DeiT-B ViT vs DenseNet121 CNN; secondary comparisons included DeiT-Ti (Tiny), ResNet152, and EfficientNetB7. Sample efficiency was evaluated by training models on varying dataset sizes. Hidden stratification was evaluated by comparing prevalence of chest tubes in pneumothorax false-positive and false-negative predictions and specific abnormalities for MURA false-negative predictions. RESULTS: DeiT-B weighted area under the receiver operating characteristic curve (wAUC) was slightly lower than that for DenseNet121 on chest radiograph (0.78 vs 0.79; P < .001) and extremity (0.887 vs 0.893; P < .001) internal test sets and chest radiograph external test sets (P < .001 for each). DeiT-B and DeiT-Ti both performed slightly worse than all CNNs for chest radiograph and extremity tasks. DeiT-B and DenseNet121 showed similar sample efficiency. DeiT-B had lower chest tube prevalence in false-positive predictions than DenseNet121 (43.1% [324 of 5088] vs 47.9% [2290 of 4782]). CONCLUSION: Although DeiT models had lower wAUCs than CNNs for chest radiograph and extremity domains, the differences may be negligible in clinical practice. DeiT-B had sample efficiency similar to that of DenseNet121 and may be less susceptible to certain types of hidden stratification.Keywords: Computer-aided Diagnosis, Informatics, Neural Networks, Thorax, Skeletal-Appendicular, Convolutional Neural Network (CNN), Feature Detection, Supervised Learning, Machine Learning, Deep Learning Supplemental material is available for this article. © RSNA, 2022.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。