Abstract
Underwater acoustic target recognition is critical for a broad spectrum of marine applications, yet its performance is often hindered by environmental variability, non-stationary propagation effects, and inherently low signal-to-noise ratio conditions. This work presents a novel hybrid deep learning framework that synergistically integrates global and local acoustic representations to enhance recognition robustness under such adverse conditions. The proposed approach employs a dual-branch encoder: a pre-trained self-supervised Audio Spectrogram Transformer branch to capture long-range temporal dependencies, and a multi-scale convolutional branch to extract fine-grained local spectral patterns. To further improve decision stability and mitigate uncertainty near classification boundaries, we introduce a Gaussian sampling-based classification module, which models class-specific weights as probabilistic distributions and performs Monte Carlo inference. Experiments on two representative underwater acoustic benchmark demonstrate that the proposed method not only achieves state-of-the-art recognition accuracy but also exhibits strong resilience to different environmental noise. Ablation analyses further validate the complementary advantages of local-global feature fusion and the probabilistic decision mechanism. These findings suggest that the proposed hybrid architecture offers a promising and practical solution for robust underwater acoustic classification in real-world operational scenarios.