Abstract
BACKGROUND: Heart and liver diseases are among the leading causes of mortality globally. Their early detection can prevent complications, reduce costs, and ensure healthy lives and well-being for everyone. Machine learning-based predictive models, such as logistic regression (LR) and decision tree (DT) are commonly used in health science for prediction. Even though LR and DT are effective, they have major drawbacks. DT assigns the same class to all observations in a branch which can lead to overfitting, while LR model often produces high misclassification rates and overfit when dealing with high-dimensional data. To address these limitations, this paper proposes a hybrid classification model combining the strengths of DT and LR. MATERIALS AND METHODS: Monte Carlo simulation and empirical study are conducted to compare the predictive performance of the proposed model with LR, DT, Support Vector Machine, K-Nearest Neighbors, and Random Forest. For the empirical study two datasets namely, heart disease prediction dataset (balanced, 303 observations, 14 variables) and Liver disease dataset (imbalanced, 583 observations, 11 variables) are used. RESULTS: The simulation results indicate the hybrid model outperforms other classification models considered, across various sample sizes. These findings are consistent with the empirical data, showing predictive accuracy of 91% for heart disease and 95% for liver disease data. CONCLUSION: The developed hybrid model has enhanced prediction accuracy irrespective of sample size and effectively handles both balanced and imbalanced data, reducing the need to identify suitable balancing techniques. Improved efficiency can help with early detection, better decision-making, and improved health systems.