Abstract
OBJECTIVES: Cardiovascular Disease (CVD) remains one of the leading causes of global mortality, accounting for millions of deaths annually. Early and accurate diagnosis plays a critical role in reducing mortality and healthcare burden. However, conventional diagnostic approaches often suffer from misdiagnosis, delayed treatment, and increased medical costs. Machine Learning (ML) has shown significant potential in supporting clinical decision-making for early CVD detection. Nevertheless, ML models often face challenges such as computationally expensive parameter tuning and susceptibility to local minima. This study aims to address these challenges by proposing a bio-inspired optimization framework to enhance diagnostic accuracy and efficiency. METHODS: This study employs Bacterial Colony Optimization (BCO) to optimize the hyperparameters of ten machine learning classifiers: Logistic Regression, Support Vector Machine (SVM), K-Nearest Neighbors, Multilayer Perceptron, Naïve Bayes, Random Forest (RF), Decision Tree, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, and AdaBoost. Principal Component Analysis (PCA) is integrated to handle feature dimensionality and multicollinearity. Experiments were conducted using the Cleveland Heart Disease dataset (CLE) and the IEEE DataPort dataset (HGR), applying a rigorous 5-fold Cross-Validation (CV) strategy to ensure reliability and stability. RESULTS: Experimental findings demonstrate that the integration of PCA, BCO, and ML classifiers significantly improves prediction performance compared to baseline models. The BCO-optimized RF model achieved the highest mean accuracy of 92.02% (95% CI: 89.93-94.10) on the HGR dataset, outperforming the baseline accuracy of 91.26%. Similarly, the BCO-SVM model achieved a mean accuracy of 85.79% on the CLE dataset. Confidence interval analysis further confirmed enhanced model stability and reduced prediction variance. CONCLUSION: The proposed framework effectively enhances CVD diagnosis by improving classification accuracy and stability. By efficiently exploring the search space and mitigating local minima limitations, the framework provides a statistically robust and clinically reliable decision-support tool for early cardiovascular risk detection.