Abstract
Aiming to address the challenges of reduced detection accuracy in face mask applications due to mutual occlusion, lighting variations, and detection distance, this paper proposes a face mask detection algorithm tailored for complex environments. First, we construct a comprehensive face mask dataset. Then, based on the YOLOv8 architecture, we enhance the C2f module in the backbone network by incorporating depth-separable convolutions to better capture the color and texture features of the target. We also integrate the SENet attention mechanism to further optimize feature extraction efficiency. To improve the transmission of fine-grained face mask features within the network, we introduce context-aware convolutions in the Neck module, which facilitates the integration of contextual semantic information and enriches the feature details of small targets. Building on this, we design an enhanced detection head, DAM-Head, which amplifies target saliency and improves both target recognition and localization accuracy. Experimental results demonstrate that the proposed algorithm achieves a mean Average Precision (mAP) of 98.11% and a Frames Per Second (FPS) rate of 135.61 on the constructed dataset, outperforming other mainstream algorithms in both accuracy and real-time performance.