Abstract
Glaucoma remains a critical cause of permanent global visual disability, and is produced by advancing destruction of the visual nerve head (ONH). Early detection is critical important in preventing vision loss. We propose a new fusion transformer pipeline, which integrates optic disc/cup and feature-based segmentation to aid in the effective screening of glaucoma, in this paper. The proposed approach integrates U-Net with an attention mechanism to cut the Optic Disc (OD) and Optic Cup (OC), enabling after processing spectral shape descriptors to evaluate Vertical Cup-to-Disc Ratio (CDR). Fundus image descriptors are extracted together with the Swin Transformer encoder to detect glaucoma at the image scale. They employ a probabilistic fusion method to merge structural biomarker (CDR) and deep learning features to finally obtain the final glaucoma classification. The framework was studied in detail on three popular publicly available datasets: LAG, ACRIMA, and DRISTHI-GS. According to the experimental results, SwinCup-DiscNet consistently outperforms the traditional CNN-based models and methods that are based only on segmentation, as it surpasses these approaches on all datasets. The framework proves to be robust, reliable, and clinically interpretable, using execution metrics like DSC IoU, accuracy measures, and F1-score, as well as Cup-to-Disc Ratio Mean Absolute Error (CDR MAE). Findings show that SwinCup-DiscNet is a highly effective clinical tool used in real-world clinical settings to detect glaucoma early.