Abstract
PURPOSE/SIGNIFICANCE: Sugarcane is a vital global crop, critical for sugar and energy production. The accurate and timely identification of its leaf diseases is paramount for sustaining the health and stability of the sugarcane industry. While deep learning models offer promising solutions, their deployment on mobile or edge devices is often hindered by substantial model size and high computational demands. Conversely, existing lightweight models frequently compromise on feature extraction capabilities and recognition accuracy. To bridge this gap, this study develops an architecturally improved lightweight model designed to achieve both high accuracy and computational efficiency. METHODS: We propose the ReMA-MobileViT model, which significantly enhances feature representation by incorporating a newly designed Residual Multi-head Attention (ReMA) module. This module ingeniously leverages a multi-head attention mechanism to capture richer contextual information from diverse subspaces, while its residual connection structure effectively mitigates network degradation and facilitates robust gradient flow. The proposed model underwent rigorous training and evaluation on a comprehensive Mendeley Data repository for classification tasks. RESULTS: Experimental evaluations demonstrate that the ReMA-MobileViT model achieves an outstanding classification accuracy of 99.02% on the sugarcane leaf disease dataset, substantially surpassing existing state-of-the-art methods. An ablation study confirms the module's efficacy, showing that the ReMA-MobileViT model, integrated with the ReMA module, improved accuracy, recall, and F1-Score by 1.58, 1.76, and 1.58 percentage points, respectively, over the baseline MobileViT. Comparative analyses further illustrate ReMA-MobileViT's superior overall performance; it exceeds classic lightweight MobileNetV2 by 15.77 percentage points and the mainstream Vision Transformer by 2.96 percentage points in accuracy. Critically, ReMA-MobileViT achieves this with significantly fewer model parameters and reduced computational complexity compared to Vision Transformer, establishing a superior balance between accuracy and efficiency. CONCLUSION: The proposed ReMA-MobileViT model offers an effective and lightweight solution for improving sugarcane leaf disease recognition accuracy, particularly in challenging complex backgrounds. Its ability to balance high accuracy with computational efficiency presents a promising technical avenue and a deployable solution for high-precision crop disease diagnosis systems on resource-constrained mobile or edge platforms.