Research and implementation of multi-disease diagnosis on chest X-ray based on vision transformer

基于视觉变换器的胸部X光多病诊断的研究与实现

阅读:1

Abstract

BACKGROUND: Disease diagnosis in chest X-ray images has predominantly relied on convolutional neural networks (CNNs). However, Vision Transformer (ViT) offers several advantages over CNNs, as it excels at capturing long-term dependencies, exploring correlations, and extracting features with richer semantic information. METHODS: We adapted ViT for chest X-ray image analysis by making the following three key improvements: (I) employing a sliding window approach in the image sequence feature extraction module to divide the input image into blocks to identify small and difficult-to-detect lesion areas; (II) introducing an attention region selection module in the encoder layer of the ViT model to enhance the model's ability to focus on relevant regions; and (III) constructing a parallel patient metadata feature extraction network on top of the image feature extraction network to integrate multi-modal input data, enabling the model to synergistically learn and expand image-semantic information. RESULTS: The experimental results showed the effectiveness of our proposed model, which had an average area under the curve value of 0.831 in diagnosing 14 common chest diseases. The metadata feature network module effectively integrated patient metadata, further enhancing the model's accuracy in diagnosis. Our ViT-based model had a sensitivity of 0.863, a specificity of 0.821, and an accuracy of 0.834 in diagnosing these common chest diseases. CONCLUSIONS: Our model has good general applicability and shows promise in chest X-ray image analysis, effectively integrating patient metadata and enhancing diagnostic capabilities.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。