Abstract
Accurate identification of weed species at early developmental stages is essential for advancing precision agriculture. Species-level classification enables site-specific management strategies, reducing herbicide use and promoting sustainable crop production. This study introduces a curated dataset of RGB images captured using a handheld Canon PowerShot SX540 HS camera, offering a spatial resolution of 5184 × 3888 pixels. The images were collected during May and June of the 2021 and 2022 growing seasons from commercial tomato fields in Santa Amalia, Badajoz, an important agricultural hub in Spain's Vegas Altas region, known for its intensive tomato cultivation and processing facilities. The dataset contains 1217 JPG images and 21,208 labelled instances. It is divided into two subsets: the first includes 938 images and 9060 instances from 2021, while the second comprises 278 images and 11,931 instances from 2022. Each image is manually annotated to identify individual plant species. This dataset is intended for training and evaluating advanced deep learning models, including convolutional neural networks and vision transformers, to enable early-stage weed detection and classification. By making it publicly accessible, the study supports the development of image-based monitoring systems that improve the efficiency, accuracy, and environmental sustainability of precision agriculture practices.