Abstract
In response to the challenges of target recognition and misjudgment caused by varying target scales, diverse shapes, and interference such as lake surface reflections in river and lake scenarios, this paper proposes the YOLO v11n-DDH model for fast and detection of spatial targets in river and lake environments. The model builds upon YOLO v11n by introducing the Dynamic Snake Convolution (DySnakeConv) to enhance the ability to extract detailed features. It integrates the Deformable Attention Mechanism (DAttention) to strengthen key features and suppress noise, while combining the improved High-Level Screening Feature Pyramid Network (HSFPN) structure for multi-level feature fusion, thus improving the semantic representation of targets at different scales. Experiments on a self-constructed dataset show that the precision, recall, and mAP of the YOLO v11n-DDH model reached 88.4%, 78.9%, and 83.9%, respectively, with improvements of 3.4, 2.9, and 2.5 percentage points over the original model. Specifically, DySnakeConv increased mAP@50 by 0.6 percentage points, DAttention improved mAP@50 by 0.3 percentage points, and HSFPN contributed to a 0.9 percentage point rise in mAP@50. This patrol system can effectively identify and visualize various pollutants in river and lake areas, such as underwater waste, water quality pollution, illegal swimming and fishing, and the "Four Chaos" issues, providing technical support for intelligent river and lake management.