Abstract
Accurate blood glucose prediction from HbA(1c) is crucial for personalized health monitoring and diabetes management. Existing models often overlook data imbalances and hyperglycemic outliers, reducing accuracy. This study proposes a machine-learning framework to address these challenges. This study investigates a dataset of 197,180 patient samples, focusing on key features such as age, glucose, and HbA(1c) levels. The performance of 42 machine learning models was evaluated using Kernel Density Estimates (KDE) analysis and targeted oversampling to enhance model robustness. A Flask web application has been developed for deployment on the Heroku platform. Results show that the MLP Regressor achieved a moderate R(2) with the raw data. Logarithmic transformation effectively reduced RMSE. Gaussian KDE identified a low-density region around 550 mg/dL in hyperglycemia, prompting targeted oversampling. This, combined with logarithmic transformation, resulted in an R(2) of 0.93 and the lowest RMSE with the LGBM model, indicating strong predictive robustness for blood glucose levels. The proposed approach effectively enhances glucose level prediction accuracy in both healthy and diabetic management. The Heroku-deployed web app provides an accessible tool for clinicians and individuals, supporting real-time diabetes management and personalized health monitoring. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-025-20234-z.