Abstract
In the field of industrial inspection, image segmentation is a common method for surface inspection, capable of locating and segmenting the appearance defect areas of products. Most existing methods are trained specifically for particular products. The recent SAM (Segment Anything Model) serves as an image segmentation foundation model, capable of achieving zero-shot segmentation through diverse prompts. Nevertheless, SAM's performance in special downstream tasks is not satisfactory. Additionally, SAM requires prior manual interactions to complete segmentation and post-processing of the segmentation results. This paper proposes SAID (Segment All Industrial Defects) to deal with these issues. The SAID model encodes single-annotated prompt-image pairs into scene embedding via Scene Encoder, achieving automatic segmentation and eliminating the reliance on manual intervention. Meanwhile, SAID's Feature Alignment and Fusion Module effectively addresses the alignment issue between scene embedding and image embedding. Experimental results demonstrate that SAID outperforms SAM in segmentation capabilities across various industrial scenes. Under the one-shot target scene segmentation task, SAID also improves the mIoU metrics by 5.79 and 0.87 compared to the MSNet and SegGPT, respectively.