Abstract
In Kimbugak manufacturing system, sorting and loading before frying are still performed manually, which imposes a heavy workload on workers and limits scalability. This study focuses on detecting and classifying the physical characteristics of dried laver bugak to enable robotic pick-and-place operations by developing a You Only Look Once (YOLO) and Real-time Detection Transformer (RT-DETR) deep learning detection model based on fused RGB and infrared (IR) images for integration into a robotic automation system. Through experiments, it was found that at least five physical classes are needed for effective robotic handling. A novel approach combining RGB and IR image fusion using the Visual Geometry Group 19-layer (VGG19) network is introduced to enhance the input quality for detection. Experimental results show that the YOLOv11l model significantly outperforms the YOLOv8s model, achieving an F1 score of 0.94 and a mean Average Precision at 0.5 (mAP@0.5) of 0.95. These results demonstrate that VGG-based image fusion with YOLOv11l is an optimal solution for classifying and locating dried laver bugak. This research highlights the importance of physical class definition, multimodal image fusion, and detector selection in developing an effective automated sorting and loading system for Kimbugak production.