Abstract
Side-scan sonar (SSS) serves as the primary perceptual instrument for Autonomous Underwater Vehicles (AUVs) in large-scale marine search and rescue (SAR) operations. However, the detection of critical targets is frequently hindered by severe hydro-acoustic noise, the spatial discontinuity of wreckage, and the weak visual signatures of small targets. To surmount these challenges, this paper presents WPG-DetNet. First, we introduce a Wavelet-Embedded Residual Backbone (WERB) to reconstruct the conventional downsampling paradigm. By substituting standard pooling with the Discrete Wavelet Transform (DWT), this architecture explicitly disentangles high-frequency noise from structural information in the frequency domain, thereby achieving the adaptive preservation of edge fidelity for large human-made targets while filtering out speckle interference. Then, addressing the distinct challenge of discontinuous aircraft wreckage, the framework further incorporates a Debris Graph Reasoning Module (D-GRM). This module models scattered fragments as nodes in a topological graph to capture long-range semantic dependencies, transforming isolated instance recognition into context-aware scene understanding. Finally, to bridge the gap between AI and underwater physics, we design a Shadow-Aided Decoupling Head (SADH) equipped with a physics-informed geometric loss. By enforcing mathematical consistency between target height and acoustic shadow length, this mechanism establishes a rigorous discriminative criterion capable of distinguishing weak-echo human bodies from seabed rocks based on shadow geometry. Experiments on the SCTD dataset demonstrate that WPG-DetNet achieves a mean Average Precision (mAP50) of 97.5% and a Recall of 96.9%. Quantitative analysis reveals that our framework outperforms the classic Faster R-CNN by a margin of 12.8% in mAP50 and surpasses the Transformer-based RT-DETR-R18 by 5.6% in high-precision localization metrics (mAP50:95). Simultaneously, WPG-DetNet maintains superior efficiency with an inference speed of 62.5 FPS and a lightweight parameter count of 16.8 M, striking an optimal balance between robust perception and the real-time constraints of AUV operations.