Abstract
Focusing on the practical challenges of insufficient samples, incomplete categories, and low detection accuracy (particularly for small targets) in Personal Protective Equipment (PPE) wearing condition monitoring for operators in offshore environments, this research investigates PPE targets detection for offshore operators using an improved YOLOv11 model. The optimized model integrates the time-frequency features enhancement module (Spatial Pyramid Pooling-Fast, SFEAF) into the model's backbone network, employs a statistical-driven dynamic gating attention module (Token Statistics Self-Attention, TSSA) to refine attention weight distribution in the original C2PSA module, and incorporates a Normalized Wasserstein Distance (NWD) loss function. These modifications collectively enhance the model's capability to detect PPE targets for offshore operators. To mitigate missed detection problem of small targets such as earplugs and gloves, a cascaded network of YOLOv11 and YOLOv11-Pose models is proposed for small targets detection. The solution involves extracting human key points through YOLOv11-Pose model, constructing spatial constraint regions via two-point area positioning method, enhancing small target features through localized region cropping and normalization, and performing secondary detection on refined regions using YOLOv11 model. The ablation experiments show that the mAP@0.5 value of the optimization model has been improved by 1.8 percentage points compared to the original model for all targets, and the precision rates for both positive and negative samples of small targets-earplugs and gloves-are respectively improved by 5.2%, 4.2%, 0.2%, and 3.7%. The superiority of the optimization method has been proved. Furthermore, secondary detection experiments on small targets yielded an average Missed Detection Recovery Rate (MRR) of 56.64%, and the effectiveness of the multi-model cascaded detection method has been verified.