Abstract
PURPOSE: To investigate the fairness of existing deep models for diabetic retinopathy (DR) detection and introduce an equitable model to reduce group performance disparities. METHODS: We evaluated the performance and fairness of various deep learning models for DR detection using fundus images and optical coherence tomography (OCT) B-scans. A Fair Adaptive Scaling (FAS) module was developed to reduce group disparities. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), and equity across various groups was assessed by equity-scaled AUC, which accommodated both overall AUC and AUCs of individual groups. RESULTS: Using color fundus images, the integration of FAS with EfficientNet improved the overall AUC and equity-scaled AUC from 0.88 and 0.83 to 0.90 and 0.84 (P < 0.05) by race. AUCs for Asians and Whites increased by 0.05 and 0.03, respectively (P < 0.01). For gender, both metrics improved by 0.01 (P < 0.05). Using DenseNet121 on OCT B-Scans by race, FAS improved the overall AUC and equity-scaled AUC from 0.875 and 0.81 to 0.884 and 0.82, with gains of 0.03 and 0.02 for Asians and Blacks (P < 0.01). For gender, DenseNet121's metrics rose by 0.04 and 0.03, with gains of 0.05 and 0.04 for females and males (P < 0.01). CONCLUSIONS: Deep learning models demonstrate varying accuracies across different groups in DR detection. FAS improves equity and accuracy of deep learning models. TRANSLATIONAL RELEVANCE: The proposed deep learning model has a potential to improve both model performance and equity of DR detection.