Abstract
BACKGROUND: Class imbalance is a frequent and severe problem in medical datasets, where instances from the minority class are usually high risk or disease positive. Most traditional classifiers suffer from a biasness towards the majority class, resulting in a poor detection rate of the minority class and, therefore, decreased confidence in prediction systems in medical applications. METHODS: In this paper, we present an optimized ensemble by differential evolution (OEDE), a novel ensemble learning framework, to address this problem. OEDE harmonizes three dissimilar base learners (Logistic Regression, Random Forest, and XGBoost) and trains each using class-balancing techniques. Next, the model utilized Differential Evolution (DE) to discover the most appropriate ensemble weights to maximize the area under the ROC curve (AUC) on a validation dataset. RESULT: We conducted experiments on four real-world medical datasets, whose imbalance ratios vary from 1.89 to 14.6, using OEDE in the original, SMOTE, and ADASYN balanced conditions. Experimental results demonstrate substantial performance gain of OEDE on the challenging Thoracic dataset, achieving a 70.08% AUC, outperforming the standard Random Forest (50.82%) and AdaBoost (47.15%) baselines by over 19%. Additionally, on the Cervical Cancer dataset, the model achieved a peak AUC of 97.89%. The results indicate that the proposed OEDE consistently outperforms or is competitive with traditional ensemble models in terms of AUC, F1-score, and Recall. ROC curve analysis also approved the OEDE's superior discriminative capabilities. CONCLUSION: The proposed OEDE framework effectively improves minority class detection in imbalance medical datasets. Its robust and flexible design makes it a promising tool for healthcare risk prediction tasks where minority class groups need to be well identified.