Comparative Analysis of Transformer Architectures and Ensemble Methods for Automated Glaucoma Screening in Fundus Images from Portable Ophthalmoscopes

基于Transformer架构和集成方法的便携式眼底镜图像自动青光眼筛查的比较分析

阅读:1

Abstract

Deep learning for glaucoma screening often relies on high-resolution clinical images and convolutional neural networks (CNNs). However, these methods face significant performance drops when applied to noisy, low-resolution images from portable devices. To address this, our work investigates ensemble methods using multiple Transformer architectures for automated glaucoma detection in challenging scenarios. We use the Brazil Glaucoma (BrG) and private D-Eye datasets to assess model robustness. These datasets include images typical of smartphone-coupled ophthalmoscopes, which are often noisy and variable in quality. Four Transformer models-Swin-Tiny, ViT-Base, MobileViT-Small, and DeiT-Base-were trained and evaluated both individually and in ensembles. We evaluated the results at both image and patient levels to reflect clinical practice. The results show that, although performance drops on lower-quality images, ensemble combinations and patient-level aggregation significantly improve accuracy and sensitivity. We achieved up to 85% accuracy and an 84.2% F1-score on the D-Eye dataset, with a notable reduction in false negatives. Grad-CAM attention maps confirmed that Transformers identify anatomical regions relevant to diagnosis. These findings reinforce the potential of Transformer ensembles as an accessible solution for early glaucoma detection in populations with limited access to specialized equipment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。