Abstract
Diabetes is one of the main diseases posing a threat to healthcare systems. One of the complications of diabetes is diabetic retinopathy, which, if left untreated, can lead to serious consequences such as blindness. Early detection of this disease is critical to prevent disability and stop the process of vision loss. In our research, we aimed to develop and validate a machine learning model enabling early diagnosis of retinopathy disease. We were the first to conduct research using as many as eight public databases and one private database collected during the project implemented by the Ministry of Digital Affairs and the Ministry of Health of Poland. We analyzed 14,402 fundus photographs from patients, leveraging this large dataset to enhance the trustworthiness and validity of our findings. Such a large number of photos emphasizes the credibility and reliability of the results obtained. A significant innovation in our approach includes employing forty-six unique methods for feature selection and extraction, utilizing techniques such as CLAHE, B-CosFire, and Hough transform. We chose XgBoost and Random Forest algorithms for classification, with parameter tuning performed via the Optuna library. Our most successful model, employing the Random Forest algorithm combined with LBP and GLCM for feature extraction, reached a classification accuracy of 80.41%, F1-Score of 74.41%, and AUC of 0.80. The machine learning model we developed proved highly effective in the early detection of diabetic retinopathy. Further refinement is recommended to make this model a viable tool in clinical settings.