Abstract
Background: Visual impairment remains a critical public health challenge, and diabetic retinopathy (DR) is a leading cause of preventable blindness worldwide. Early stages of the disease are particularly difficult to identify, as lesions are subtle, expert review is time-consuming, and conventional diagnostic workflows remain subjective. Methods: To address these challenges, we propose a novel Pixel-Attention W-shaped (PAW-Net) deep learning framework that integrates a Lesion-Prior Cross Attention (LPCA) module with a W-shaped encoder-decoder architecture. The LPCA module enhances pixel-level representation of microaneurysms, hemorrhages, and exudates, while the dual-branch W-shaped design jointly performs lesion segmentation and disease severity grading in a single, clinically interpretable pass. The framework has been trained and validated using DDR and a preprocessed Messidor + EyePACS dataset, with APTOS-2019 reserved for external, out-of-distribution evaluation. Results: The proposed PAW-Net framework achieved robust performance across severity levels, with an accuracy of 98.65%, precision of 98.42%, recall (sensitivity) of 98.83%, specificity of 99.12%, F1-score of 98.61%, and a Dice coefficient of 98.61%. Comparative analyses demonstrate consistent improvements over contemporary architectures, particularly in accuracy and F1-score. Conclusions: The PAW-Net framework generates interpretable lesion overlays that facilitate rapid triage and follow-up, exhibits resilience under domain shift, and maintains an efficient computational footprint suitable for telemedicine and mobile deployment.