Abstract
In abnormal goat lung sound recognition, high inter-class similarity and large intra-class variability pose significant challenges. To address this issue and improve recognition performance, we propose a deep learning model, AAF-SwinT, based on an improved Swin Transformer. The model replaces the original Swin Transformer self-attention module with Axial Decomposed Attention (ADA), modeling the temporal and frequency axes separately and integrating attention weights to mitigate inter-class feature similarity. Adaptive Spatial Aggregation for Patch Merging (ASAP) is designed to emphasize key time-frequency regions, and a Frequency-Aware Multi-Layer Perceptron (FAM) is introduced to model features across different frequency bands, further enhancing the discriminative ability for abnormal lung sounds. Experiments on a self-constructed goat lung sound dataset demonstrate that AAF-SwinT achieves an accuracy of 88.21%, outperforming existing mainstream Transformer-based models by 2.68-5.98%. Ablation studies further confirm the effectiveness of each proposed module, improving the accuracy of baseline Swin Transformer model from 85.53% to 88.21%. These results indicate that the proposed approach exhibits strong robustness and practical potential for abnormal lung sound recognition in goats, providing technical support for early diagnosis and management of respiratory diseases in large-scale goat farming.