Abstract
To address the challenges of low localization accuracy, weak feature discrimination, and real-time constraints in small-scale metallic surface defect detection, this study proposes a novel detection framework named FF-MDE (fine-grained multi-branch diversified encoder). The proposed model enhances detection capability through three key modules: a direction-aware multi-branch convolutional block (BBDES) that strengthens orientation-sensitive feature extraction via re-parameterization; a multi-scale fusion network (MFFN) incorporating a location-offset heterogeneous kernel selection strategy and a channel cross-embedding mechanism to align cross-resolution features and expand receptive fields adaptively; and a lightweight attention-guided up-sampling that improves fine-detail recovery while suppressing irrelevant responses. A large number of experiments on the NEU-DET and GC10-DET datasets show that FF-MDE has achieved excellent performance. The indicators of mAP50 are 70.3 and 57.9% respectively, which are 4.1 and 3.3% higher than the benchmark model and 7.6 and 6.5% higher on average than the existing methods. Moreover, the real-time inference speed of this network exceeds 60 FPS, providing a powerful and deployable solution for high-precision defect detection in industrial vision inspection systems.