Abstract
Plant disease diagnosis based on visual symptoms is crucial for preventing yield loss; however, deployment in practical settings remains challenging due to inter-class similarity, background noise, and limited computational resources. This study presents a plant disease classification framework evaluated on a curated multi-crop dataset aggregated from multiple publicly available repositories, comprising 51 disease and healthy classes. The dataset includes approximately 45,000 original images that were expanded through controlled augmentation during training to improve generalization. We benchmark eight ImageNet-pretrained tiny vision transformer architectures trained for up to 50 epochs. Among these, CAFormer-s18 achieved strong validation performance but with increased computational overhead. To enable efficient and computationally lightweight solutions, we design two fully customized convolutional neural networks: PlantaNetLite (1.28M parameters) and PlantaNet (2.58M parameters). After hyperparameter optimization and full 100-epoch training, PlantaNet achieved 99.37% validation accuracy and 99.66% test accuracy with a compact model size (9.85 MB) and moderate computational cost, while PlantaNetLite achieved a best validation accuracy of 99.22% under further parameter reduction. Qualitative Grad-CAM and Grad-CAM++ analyses provide insight into the regions influencing model predictions. Overall, the proposed models demonstrate competitive accuracy while maintaining computational efficiency, highlighting their potential suitability for resource-constrained deployment scenarios.