Abstract
Diabetes Mellitus (DM) is a chronic metabolic disorder and a major global health problem, with many cases undiagnosed. Early detection and effective management are essential to prevent complications. This paper presents an efficient hybrid technique that combine the Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN) with ensemble learning termed (SMENN-Hybrid). Gradient Boosting was identified as the most effective ensemble method through rigorous multi-metric evaluation. The proposed approach was rigorously evaluated across five diverse datasets: PIMA India, Diabetes Prediction Dataset (DPD), Diabetes Dataset 2019, Raw Merged Dataset (RMD), and Cleaned Merged Dataset (CMD). A comprehensive multi-metric assessment considering F1-Score, ROC-AUC, and Accuracy demonstrated exceptional generalizability, with Gradient Boosting achieving a composite score of 99.93/100 and maintaining coefficients of variation below 2% across all metrics (mean F1=0.9860, ROC-AUC=0.9990, Accuracy=0.9860). 5-fold stratified cross-validation confirmed remarkable stability (overall CV < 1.65% for all metrics), while systematic ablation studies validated the essential synergy between SMOTE and ENN, showing average improvements of +16.78% in F1-Score and +29.47% in Recall over unbalanced baselines. Compared to traditional methods (Logistic Regression and Decision Tree), the proposed framework achieved consistent improvements of +2.99% average F1-Score over the best baseline, with individual dataset gains ranging from +3.25% to +3.99%. Despite 246× longer training time, inference remains practical at 2.47ms, making the approach suitable for real-time clinical deployment. The combination of high effectiveness (mean F1=0.9841), exceptional consistency (CV < 2%), and comprehensive validation across multiple datasets and evaluation dimensions positions this framework as a clinically deployable solution for diabetes detection without dataset-specific tuning, offering significant advantages for similar healthcare classification tasks.