Abstract
Sugarcane diseases significantly reduce crop yield and quality, posing persistent challenges to the agricultural sector. This study presents a novel ensemble framework that integrates Vision Transformer and Swin Transformer architectures for accurate sugarcane leaf disease detection. By combining global self-attention with localized window-based attention mechanisms, the proposed model effectively captures multi-scale visual features associated with diverse disease symptoms. Experimental evaluation on a large, labeled sugarcane leaf dataset achieved a validation accuracy of 98.16% and a test accuracy of 97.06%, outperforming several convolutional neural network baselines. Additionally, a large language model (LLM) interface is employed as a post-prediction decision-support module, generating disease-specific descriptions and management suggestions based solely on the predicted disease class. This integrated framework indicates the potential effectiveness of transformer-based ensemble models combined with intelligent advisory support for practical decision-making in precision agriculture.