Abstract
Diabetic Retinopathy (DR) is a leading cause of blindness worldwide, and its early detection and accurate grading play a crucial role in clinical intervention. To address the dual limitations of existing methods in multi-scale lesions feature fusion and lesions relation modeling, this study proposes a novel adaptive multi-scale convolutional neural network model for fine-grained grading of DR, called MAFNet (Multi-scale Adaptive Fine-grained Network). The model is constructed through three core modules to establish a multi-scale feature integration framework: the Hierarchical Global Context Module (HGCM) effectively expands the receptive field by employing multi-scale pooling and dynamic feature fusion, capturing lesions features from micro to large-scale areas; the Multi-scale Adaptive Attention Module (MSAM) utilizes an adaptive attention mechanism to dynamically adjust the feature weights at different spatial locations, enhancing the representation of key lesions regions; and the Relational Multi-head Attention Module (RMA) uses a multi-head attention mechanism to model the complex relationships between features in parallel, improving the accuracy of fine-grained lesions identification. Furthermore, MAFNet adopts a multi-task learning framework, transforming the DR grading task into a dual-task structure of regression and classification, thereby effectively capturing the progression of DR. Extensive experiments on three publicly available datasets, DDR, Messidor-2, and APTOS, show that the quadratic weighted Kappa values of the MAFNet model reach 0.934, 0.917, and 0.936, respectively, significantly outperforming existing DR grading methods such as LANet and MPLNet, demonstrating its significant application value in automated DR grading.