Abstract
One of the main causes of blindness in the globe and a dangerous side effect of diabetes is Diabetic Retinopathy (DR). Preventing permanent vision loss and guaranteeing prompt medical intervention depend on early identification and precise DR severity assessment. Conventional techniques for diagnosing DR depend on experts manually examining retinal pictures, which takes time and is vulnerable to subjectivity. Although Artificial Intelligence (AI) technologies have become attractive substitutes, current approaches frequently suffer from unbalanced classification performance, poor generalisability, and trouble differentiating various DR severity levels. This work offers a Hybrid Deep Learning (DL) strategy D-TNet, which combines DenseNet121 for spatial feature extraction with a Transformer architecture for modeling long-range contextual dependencies to overcome these issues. The model detects key DR indicators like microaneurysms, hemorrhages, and neovascularization. A collection of retinal images from APTOS2019 and Messidor-2 datasets was used to train and test this hybrid model for DR severity grading in five different categories namely healthy retina, Mild, Moderate, Proliferative, and Severe. The model detects key DR indicators like microaneurysms, hemorrhages, and neovascularization. Evaluated on two benchmark datasets, the model achieved 97% accuracy, 0.94 F1-score, and 0.93 kappa score on APTOS2019, and 86% accuracy, 0.79 F1-score, and 0.80 kappa score on Messidor-2. These results demonstrate robust and balanced classification across all five DR severity stages. The model overcame the sensitivity and specificity issues frequently seen in traditional AI-based techniques by exhibiting stable and balanced classification throughout all DR phases. This method has the potential to greatly improve diabetic eye care by offering dependable and scalable DR detection, especially in environments with limited resources. Future work includes domain adaptation for cross-dataset validation and real-world deployment, along with the integration of multimodal data such as blood sugar level, fasting glucose, and HbA1c to enhance diagnostic precision.