Abstract
Background: Automatic adventitious lung sound classification using deep learning is a promising strategy for objective respiratory disease screening. Evaluating model performance is challenging, particularly with imbalanced clinical datasets. This study compares CNN architectures and proposes a dual-stream classification approach. Methods: Using the public ICBHI 2017 dataset, we compared five pre-trained architectures: VGG16, VGG19, InceptionV3, MobileNetV2, and ResNet152V2. To mitigate class imbalance, we implemented pitch shifting, random shifting, and mixup data augmentation. We also developed and evaluated a novel VGGish-dual-stream network. The primary endpoint was the Average Score (AS), the arithmetic mean of Sensitivity and Specificity. Results: Among benchmarked models, ResNet152V2 achieved the highest AS (0.541), approaching the state-of-the-art range (0.56-0.58). This performance was characterised by a high Specificity (0.67) but low Sensitivity (0.41). Our proposed dual-stream network yielded a more balanced, albeit slightly lower, performance with an AS of 0.508. Conclusions: Standard CNN architectures like ResNet152V2 can achieve competitive classification performance but may exhibit a clinically significant bias towards high specificity at the expense of sensitivity. This trade-off poses a risk of missing pathological events (false negatives). To ensure clinical safety and utility, future work must prioritise strategies that explicitly improve model sensitivity.