Abstract
To address the limitations of traditional lithium battery defect detection-low efficiency, high missed detection rates for minute/composite defects, and inadequate multimodal fusion-this study develops an improved YOLOv8 model based on multimodal fusion and attention enhancement for unified full-lifecycle multi-type defect detection. Integrating visible-light and X-ray modalities, the model incorporates a Squeeze-and-Excitation (SE) module to dynamically weight channel features, suppressing redundancy and highlighting cross-modal complementarity. A Multi-Scale Fusion Module (MFM) is constructed to amplify subtle defect expression by fusing multi-scale features, building on established feature fusion principles. Experimental results show that the model achieves an mAP@0.5 of 87.5%, a minute defect recall rate (MRR) of 84.1%, and overall industrial recognition accuracy of 97.49%. It operates at 35.9 FPS (server) and 25.7 FPS (edge) with end-to-end latency of 30.9-38.9 ms, meeting high-speed production line requirements. Exhibiting strong robustness, the lightweight model outperforms YOLOv5/7/8/9-S in core metrics. Large-scale verification confirms stable performance across the battery lifecycle, providing a reliable solution for industrial defect detection and reducing production costs.