Abstract
Background/Objectives: Discrepancies in diabetic retinopathy (DR) grading are well-documented, with retinal non-perfusion (RNP) quantification posing greater challenges. This study assessed intergrader agreement in DR evaluation, focusing on qualitative severity grading and quantitative RNP measurement. We aimed to improve agreement through structured consensus meetings. Methods: A retrospective analysis of 100 comparisons from 50 eyes (36 patients) was conducted. Two paired medical retina fellows graded ultra-widefield color fundus photographs (CFP) and fundus fluorescein angiography (FFA) images. CFP assessments included DR severity using the International Clinical Diabetic Retinopathy (ICDR) grading system, DR Severity Scale (DRSS), and predominantly peripheral lesions (PPL). FFA-based RNP was defined as capillary loss with grayscale matching the foveal avascular zone. Weekly adjudication by a senior specialist resolved discrepancies. Intergrader agreement was evaluated using Cohen's kappa (qualitative DRSS) and intraclass correlation coefficients (ICC) (quantitative RNP). Bland-Altman analysis assessed bias and variability. Results: After eight consensus meetings, CFP grading agreement improved to excellent: kappa = 91% (ICDR DR severity), 89% (DRSS), and 89% (PPL). FFA-based PPL agreement reached 100%. For RNP, the non-perfusion index (NPI) showed moderate overall ICC (0.49), with regional ICCs ranging from 0.40 to 0.57 (highest in the nasal region, ICC = 0.57). Bland-Altman analysis revealed a mean NPI difference of 0.12 (limits: -0.11 to 0.35), indicating acceptable variability despite outliers. Conclusions: Structured consensus training achieved excellent intergrader agreement for DR severity and PPL grading, supporting the clinical reliability of ultra-widefield imaging. However, RNP measurement variability underscores the need for standardized protocols and automated tools to enhance reproducibility. This process is critical for developing robust AI-based screening systems.