Abstract
Accurate detection of pests and diseases in eggplant under real-field conditions remains challenging due to large variations in target scales, complex background clutter, and the frequent presence of small and occluded objects. To address these issues, this paper proposes Eggplant-DETR, an improved detection model based on the RT-DETR architecture. The model introduces a collaborative mechanism that integrates multi-scale feature enhancement, semantic fusion, and frequency perception. The key components of this mechanism include: (1) CPSE module is designed in the shallow layers of the backbone to enhance detailed features of small pest targets, providing more discriminative low-level features for subsequent processing and mitigating detail loss of small objects; (2) The CHSFPN module effectively integrates detail-enhanced features from CPSE and global semantic information from AIFI, generating multi-scale and semantically rich feature maps for the subsequent WTMANet module; (3) at the end of the encoder integrating the WTMANet module refines the multi-scale features from CHSFPN and significantly improves perception capability for irregularly shaped targets (e.g., FruitRot) and small objects (e.g., MelonThrips). Extensive experiments on a public eggplant pest and disease dataset demonstrate that the proposed method achieves mAP50 of 77.8%, parameters reduces to 15.77M. The overall performance surpasses that of the other 11 mainstream comparison models. The framework provides an effective solution for smart agricultural pest and disease detection, while also offering valuable technical insights into the application of frequency-domain feature representation in this field.