Abstract
Diabetes has become a critical global health concern, particularly in regions where access to diagnostic facilities is limited. In this work, we propose a hybrid framework that combines extreme gradient boosting (XGBoost) and deep neural networks (DNNs) for early-stage diabetes detection, using soft voting to generate the final ensemble predictions. The proposed framework was evaluated on two datasets: the widely used Diabetes UCI dataset and a newly collected dataset from Nepal. The ensemble method achieved 99% accuracy (ACC) with an area under the curve (AUC) of 1.00 on the Diabetes UCI dataset, and 91% ACC with a 0.96 AUC on the Nepal diabetes dataset, demonstrating its strong generalisability across distinct populations. Compared to individual models, the hybrid approach offered increased stability and a lower rate of false negatives, which is particularly important in clinical contexts. These findings highlight the potential of hybrid machine learning-deep learning models as cost-effective, scalable and generalisable decision-support tools for diabetes risk assessment.