Abstract
Early and accurate detection of tumor malignancy in breast cancer is crucial for effective patient management. This study developed an explainable artificial intelligence (XAI)-based, fast, and low-data-requirement pre-screening tool for breast cancer malignancy classification. Using a Kaggle dataset with 9 clinical and demographic features from 213 patients, 8 machine learning algorithms were compared based on accuracy, sensitivity, specificity, F1 score, Roc Curve (AUC), and Matthews correlation coefficient. Ensemble models, specifically RUSBoost, and individual decision trees both achieved the highest performance with ~ 91.7% accuracy. However, the decision tree was selected for its high explainability, low computational cost, and clinical practicality. The model provides verbal decision rules: (1) malignancy classification with lymph node involvement, (2) malignancy inference regardless of tumor size in the presence of metastasis, and (3) large tumor size with advanced age indicating malignancy without lymph node involvement or metastasis. SHapley Additive exPlanations (SHAP) analysis validated and detailed the model's decision-making process. This model shows potential for integration into clinical decision support systems, offering rapid, reliable pre-screening with minimal data. Future validation studies with larger, diverse populations are planned to enhance generalizability.