MammoViT: A Custom Vision Transformer Architecture for Accurate BIRADS Classification in Mammogram Analysis

MammoViT:一种用于乳腺X线摄影分析中精确BIRADS分类的定制视觉转换器架构

阅读:1

Abstract

Background: Breast cancer screening through mammography interpretation is crucial for early detection and improved patient outcomes. However, the manual classification of mammograms using the BIRADS (Breast Imaging-Reporting and Data System) remains challenging due to subtle imaging features, inter-reader variability, and increasing radiologist workload. Traditional computer-aided detection systems often struggle with complex feature extraction and contextual understanding of mammographic abnormalities. To address these limitations, this study proposes MammoViT, a novel hybrid deep learning framework that leverages both ResNet50's hierarchical feature extraction capabilities and Vision Transformer's ability to capture long-range dependencies in images. Methods: We implemented a multi-stage approach utilizing a pre-trained ResNet50 model for initial feature extraction from mammogram images. To address the significant class imbalance in our four-class BIRADS dataset, we applied SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic samples for minority classes. The extracted feature arrays were transformed into non-overlapping patches with positional encodings for Vision Transformer processing. The Vision Transformer employs multi-head self-attention mechanisms to capture both local and global relationships between image patches, with each attention head learning different aspects of spatial dependencies. The model was optimized using Keras Tuner and trained using 5-fold cross-validation with early stopping to prevent overfitting. Results: MammoViT achieved 97.4% accuracy in classifying mammogram images across different BIRADS categories. The model's effectiveness was validated through comprehensive evaluation metrics, including a classification report, confusion matrix, probability distribution, and comparison with existing studies. Conclusions: MammoViT effectively combines ResNet50 and Vision Transformer architectures while addressing the challenge of imbalanced medical imaging datasets. The high accuracy and robust performance demonstrate its potential as a reliable tool for supporting clinical decision-making in breast cancer screening.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。