Abstract
Chronic kidney disease (CKD) is a progressive medical condition with significant public health impact, where early detection is critical for effective intervention. This work presents a clearly defined and structured data-driven framework designed to improve CKD prediction through robust preprocessing and optimized ensemble learning. This study proposes a novel hybrid framework that combines Random Forest (RF)-based imputation for missing value handling, categorical feature encoding, and synthetic minority oversampling technique (SMOTE) for addressing class imbalance, integrated with a Grey wolf optimizer (GWO)-based weighted ensemble of top-performing classifiers (Decision Tree, Logistic Regression, and Gaussian Naïve Bayes). The ensemble weights are optimized using the Grey Wolf optimizer (GWO) to enhance predictive accuracy. We evaluate the proposed framework on the UCI CKD dataset, demonstrating that it outperforms individual classifiers and conventional ensemble methods, achieving an accuracy of 98.75%, precision of 98.8%, recall of 98.6%, and F1-score of 98.7%. Additionally, explainable AI (XAI) techniques including SHAP and LIME are employed to analyze feature contributions, providing interpretable insights and confirming the clinical relevance of the predictions. Overall, the proposed framework offers a transparent, reliable, and computationally efficient clinical decision support model that bridges the gap between data-driven AI and nephrology practice.