Abstract
Cardiovascular diseases (CVDs) remain a leading global health challenge, necessitating diagnostic solutions that combine high accuracy with clinical interpretability and reproducibility. Traditional auscultation methods rely extensively on clinician expertise, resulting in variability and potential diagnostic delays, especially for subtle murmurs indicative of existing cardiac abnormalities. While automated methods have improved diagnostic accuracy, they frequently lack reproducibility and transparency, contributing to clinical mistrust. To address these challenges, we propose an Explainable Attention-Based Deep Learning framework specifically designed for classification and interpretation of heart murmurs using phonocardiogram (PCG) signals. Our approach employs a Transformer architecture tailored for robust time frequency feature extraction-such as spectrograms and Mel Frequency Cepstral Coefficients (MFCCs) applied to PCGs. Visual explanations generated through Gradient weighted Class Activation Mapping (Grad-CAM) explicitly highlight critical systolic and diastolic murmur segments driving the model's diagnostic predictions. We rigorously validated our framework across multiple datasets, including the HeartWave dataset (over 1,300 recordings), and further corroborated our results using CirCor DigiScope, PhysioNet, and Shenzhen datasets. Our revised validation strategy, adopting robust A-Test methods, demonstrated enhanced reliability with an accuracy of 96.7%, macro-F1 score of 95.5%, and an AUC above 0.97. Compared to ten baseline models-including CNN-RNN hybrids, ResNet variants, and Time Growing Neural Networks (TGNNs) our framework showed a 3-5% improvement in accuracy and a 2-4% increase in macro F1 score, particularly excelling in identifying rare conditions such as valvular defects and congenital anomalies. Ablation studies underscored the crucial role of attention mechanisms for both accuracy enhancement and interpretability, showing strong alignment between model-generated explanations and expert annotations. Future work will further explore model scalability, robustness in diverse clinical environments, and integration with multimodal data, including electrocardiograms, aiming for comprehensive and clinically trusted diagnostic support.