Abstract
Industrial defect classification remains a crucial challenge due to the visual complexity, rarity, and diversity of defect types encountered in manufacturing. In this paper, we present a hybrid deep learning framework that integrates YOLOv11 and EfficientNet-B7 to perform robust multi-class defect classification. Our model combines the semantic richness of YOLO's spatial features with the fine-grained representation power of EfficientNet, further enhanced by a Convolutional Block Attention Module (CBAM) and a lightweight Feature Pyramid Network (FPN) for attention-guided multi-scale refinement. Unlike conventional anomaly detectors or class-specific models, our framework supports unified classification across diverse object categories and defect types. We evaluate the proposed model on two datasets: the MVTec-FS benchmark, which includes 46 defect types across 14 industrial categories, and our proprietary Window dataset, which comprises three real-world defect classes under variable conditions. The model achieves a state-of-the-art accuracy of 91.90% on MVTec-FS and 96.13% on the Window dataset, outperforming existing CNN, transformer, and ensemble baselines. Ablation studies further demonstrate the incremental contribution of each module to the overall performance. These results validate our model's ability to generalize across domains and offer practical utility in industrial quality control pipelines.