Abstract
The accurate identification of crop pests and diseases is critical for global food security, yet the development of robust deep learning models is hindered by the limitations of existing datasets. To address this gap, we introduce DLCPD-25, a new large-scale, diverse, and publicly available benchmark dataset. We constructed DLCPD-25 by integrating 221,943 images from both online sources and extensive field collections, covering 23 crop types and 203 distinct classes of pests, diseases, and healthy states. A key feature of this dataset is its realistic complexity, including images from uncontrolled field environments and a natural long-tail class distribution, which contrasts with many existing datasets collected under controlled conditions. To validate its utility, we pre-trained several state-of-the-art self-supervised learning models (MAE, SimCLR v2, MoCo v3) on DLCPD-25. The learned representations, evaluated via linear probing, demonstrated strong performance, with the SimCLR v2 framework achieving a top accuracy of 72.1% and an F1 score (Macro F1) of 71.3% on a downstream classification task. Our results confirm that DLCPD-25 provides a valuable and challenging resource that can effectively support the training of generalizable models, paving the way for the development of comprehensive, real-world agricultural diagnostic systems.