Abstract
PURPOSE: To develop a more accurate glaucoma grading framework by combining multiple examination modalities, aiming to overcome the limitations of single-modality diagnostic systems for comprehensive glaucoma diagnosis. METHODS: This paper proposes a novel multi-modal-based glaucoma grading framework to classify healthy, mild glaucoma, and moderate-to-severe glaucoma patients. The method simulates the clinical diagnosis process by leveraging multiple examination modalities and integrating prior knowledge of ocular structure to enhance feature learning. A multi-modal feature fusion framework (M2F3) is developed, utilizing a multi-layer transformer (MLT) for efficient combination of modalities. A contrastive learning strategy is also employed to improve feature learning further. RESULTS: Experimental results demonstrated that the proposed M2F3 glaucoma grading method shows a substantial 0.0465 increase in Cohen's kappa (κ) coefficient compared to state-of-the-art (SOTA) methods on the Glaucoma grAding from Multi-Modality imAges (GAMMA) dataset. CONCLUSIONS: The proposed multi-modal-based glaucoma grading framework offers a more accurate diagnostic tool by integrating multiple examination modalities and prior knowledge, representing a substantial improvement over existing single-modality-based systems.