Abstract
Reliable seed accession identification underpins germplasm conservation, traceability and breeding; however, conventional assays remain destructive, labour-intensive and difficult to scale. Here, visible-near-infrared-shortwave infrared (VIS-NIR-SWIR) hyperspectral imaging (HSI; 449.54-2399.17 nm; 563 bands) was used to classify 32 grain-legume accessions (n = 3200 seeds; 100 seeds per accession), comprising 30 common bean (Phaseolus vulgaris L.) landraces plus two outgroup legumes (Vigna angularis (Willd.) Ohwi & Ohashi and Cajanus cajan (L.) Huth). Each seed was represented by one ROI-averaged spectrum obtained from mean representative pixels within a standardised 10 × 10 pixel window at the centre of each seed. A fixed stratified 70:30 seed-level training:test partition was used, with 70 seeds per accession (n = 2240) reserved for fully independent training and 30 seeds per accession (n = 960) reserved as a fully independent test set. Principal component analysis (PCA) captured 97.42% of the spectral variance in the first three components (PC1 = 63.34%, PC2 = 23.78%, and PC3 = 10.31%). One-versus-rest wavelength association mapping revealed a maximum R(2) of 0.775 at 461.37 nm, and ReliefF concentrated the strongest reduced-band signal within 449.54-456.30 nm and 577.02-597.54 nm. In the original ReliefF-selected 16-band benchmark, the subspace discriminant reached 68.25% macro-F1 and 68.54% balanced accuracy; after edge-band trimming, the alternative 16-band configuration decreased to 60.67% and 60.94%, respectively. With respect to the full-spectrum sensitivity benchmark, linear discriminant analysis achieved 96.35% balanced accuracy, followed by linear SVM (94.17%). Deep learning trained directly on the full 563-band spectra reached 84.90% test accuracy, 84.47% macro-F1, 86.27% precision and 84.90% recall, with MLP_Wide outperforming the convolutional, recurrent and attention-based alternatives. Overall, under controlled laboratory conditions, this benchmark shows that accession discrimination is driven mainly by visible-domain contrasts in the most compact representations, whereas the full spectral context remains important for the most confusable accessions and for cautious future sensor design. The reduced-band findings should therefore be interpreted as exploratory guidance for sensor design rather than as a validated deployment-ready specification.