Abstract
BACKGROUND: Breast cancer remains one of the leading causes of cancer-related mortality among women worldwide, with more than 2.3 million new cases and approximately 670,000 deaths reported globally in 2022. Early and accurate diagnosis significantly improves survival rates; however, conventional diagnostic approaches are often time-consuming and subject to inter-observer variability. Although machine learning techniques have demonstrated promising results, many existing studies lack systematic hyperparameter optimization and robust strategies to improve model generalization. This study aimed to develop an optimized and interpretable K-Nearest Neighbour (KNN) framework for breast cancer classification. METHODS: The Breast Cancer Wisconsin (Diagnostic) Dataset (WDBC), comprising 569 samples with 32 features, was used for model development and evaluation. The proposed framework incorporated advanced preprocessing, biologically informed feature engineering, hybrid feature selection, and systematic hyperparameter tuning using GridSearchCV. An ensemble KNN model employing soft voting was introduced to enhance predictive stability and performance. Model interpretability was improved using the Local Interpretable Model-Agnostic Explanations (LIME) technique to identify feature contributions for malignant and benign classifications. RESULTS: The optimized KNN model achieved an accuracy of 98.25%, while the ensemble KNN model reached 99.12% accuracy. The proposed framework demonstrated high predictive performance, improved classification stability, and enhanced interpretability through feature-level explanation analysis. CONCLUSIONS: The findings demonstrate the methodological effectiveness of an optimized and ensemble-based KNN framework for breast cancer classification. While the results indicate strong benchmark performance on the WDBC dataset, the study primarily highlights methodological robustness rather than immediate clinical generalizability. Further validation on multi-center clinical datasets is required before practical deployment in decision-support systems.