Abstract
Intelligent heart sound diagnosis based on Convolutional Neural Networks (CNN) has been attracting increasing attention due to its accuracy and efficiency, which have been improved by recent studies. However, the performance of CNN models, heavily influenced by their parameters and structures, still has room for improvement. In this paper, we propose a heart sound classification model named CAFusionNet, which fuses features from different layers with varying resolution ratios and receptive field sizes. Key features related to heart valve diseases are weighted by a channel attention block at each layer. To address the issue of limited dataset size, we apply a homogeneous transfer learning approach. CAFusionNet outperforms existing models on a dataset comprising public data combined with our proprietary dataset, achieving an accuracy of 0.9323. Compared to traditional deep learning methods, the transfer learning algorithm achieves an accuracy of 0.9665 in the triple classification task. Output data and visualized heat maps highlight the significance of feature fusion from different layers. The proposed methods significantly enhanced the performance of heart sound classification and demonstrated the importance of feature fusion, as interpreted through visualized heat maps.