Abstract
BACKGROUND: Interpretability is a key requirement for ensuring that credit risk assessment models are trustworthy and compliant with regulatory standards. Simultaneously, effectively distinguishing between noise samples and boundary samples is crucial for improving the accuracy of credit risk predictions. METHODS: This article introduces a novel credit risk assessment model, Interpretable Credit Risk Assessment Model with Identifying Boundary Samples (IAIBS). The model begins with a logistic regression sub-model that offers strong self-interpretable features. For samples that are not correctly classified, the Attribute Recognition and Perception based on the Distribution of neighboring sample features (ARPD) algorithm is applied to filter out noisy samples and identify boundary samples. A deep learning sub-model is then trained to deeply learn the risk features of these boundary samples. Finally, representative features of all samples are extracted using agglomerative clustering, and the most suitable sub-model is selected for prediction based on the similarity between each sample and the cluster centers. RESULTS: Experimental results on four public datasets demonstrate that the IAIBS model significantly outperforms 11 baseline models, as confirmed by the Nemenyi test. The model achieved area under the curve (AUC) scores of 89.17, 79.86, 97.48, and 66.03 on the PCL, FICO, CCF, and VL datasets, respectively. With appropriate parameter tuning, the IAIBS model maintains strong generalization ability, and each module contributes positively to overall performance. Additionally, the IAIBS model effectively interprets key predictors and prediction outcomes.