Abstract
Crop pests pose a significant threat to agricultural productivity, making it essential to develop effective pest management techniques. Prompt and accurate identification of pests is necessary for effective pest management and preventing significant damage. This paper proposes HyPest-Net, a hybrid deep learning architecture that integrates convolutional neural networks (CNNs) for local feature extraction, channel and spatial attention mechanisms for refining salient features, and a vision transformer (ViT-B/16) module for modeling long-range dependencies. This integrated hybrid architecture enables accurate pest classification by resolving challenges posed by visually similar species, background clutter, and varied illumination issues that standalone CNNs or ViTs inadequately address. Preprocessing and augmentation have been used to enhance the generalizability of the proposed model over the dataset. The proposed model was evaluated on two benchmark datasets: a rice pest dataset (5 classes) and the dangerous farm insects dataset (15 classes). Experimental results demonstrate that HyPest-Net achieved an accuracy of 0.95 on the rice pest dataset. The proposed model achieved a precision of 0.95, a sensitivity of 0.95, a specificity of 0.94, and an F1 score of 0.94. The proposed model achieved an accuracy of 0.93 on the dangerous farm insects dataset. The proposed HyPest-Net model offers a lightweight yet powerful solution for real-time, explainable pest classification, supporting practical applications in precision agriculture.