Abstract
Ship detection serves as a core foundational task for marine environmental perception. However, in real marine scenarios, dense vessel traffic often causes severe target occlusion while multi-scale targets, asymmetric vessel geometries, and harsh conditions (e.g., haze, low illumination) further degrade image quality. These factors pose significant challenges to vision-based ship detection methods. To address these issues, we propose M2S-YOLOv8, an improved framework based on YOLOv8, which integrates three key enhancements: First, a Multi-Scale Asymmetry-aware Parallelized Patch-wise Attention (MSA-PPA) module is designed in the backbone to strengthen the perception of multi-scale and geometrically asymmetric vessel targets. Second, a Deformable Convolutional Upsampling (DCNUpsample) operator is introduced in the Neck network to enable adaptive feature fusion with high computational efficiency. Third, a Wasserstein-Distance-Based Weighted Normalized CIoU (WA-CIoU) loss function is developed to alleviate gradient imbalance in small-target regression, thereby improving localization stability. Experimental results on the Unmanned Vessel Zhoushan Perception Dataset (UZPD) and the open-source Singapore Maritime Dataset (SMD) demonstrate that M2S-YOLOv8 achieves a balanced performance between lightweight design and real-time inference, showcasing strong potential for reliable deployment on edge devices of unmanned marine platforms.