Abstract
Non-destructive and automated detection of bacterial contamination is a critical prerequisite for ensuring high efficiency production and quality control in plant tissue culture. In this study, we developed a multispectral image acquisition system for Alocasia explants and proposed a novel image fusion model, termed Intensity-Texture enhanced Swin Fusion (ITSF). The ITSF framework employs convolutional neural networks to extract texture and intensity features from visible and near-infrared channels. Subsequently, a Swin Transformer-based module is integrated to model long-range spatial dependencies, ensuring cross-domain integration between the texture and intensity features. We formulated a composite loss function to guide the fusion process toward optimal results. This objective function integrates texture loss, entropy weighted structural similarity index (SSIM) and intensity aware dynamic gain guided loss. Experimental results demonstrate that the proposed method significantly enhances the visual saliency of bacteria and achieves superior quantitative performance across a comprehensive range of objective image fusion metrics. The detection performance reached a mean Average Precision (mAP50) of 0.949 with the fused images, satisfying industrial requirements for high-precision inspection, which provides a critical technical solution for the industrialization of automated micropropagation.