Abstract
This study aims to rapidly and non-destructively identify the geographical origin of black beans (Phaseolus vulgaris) using a portable near-infrared (NIR) spectrometer, addressing the challenge of distinguishing black beans due to significant regional variations in quality. A total of 400 black bean samples were collected from five regions in China. To improve classification accuracy, a novel model combining uncorrelated discriminant transform (UDT) with extreme gradient boosting (XGBoost) was proposed for feature extraction and classification. When evaluated with k-nearest neighbor (KNN), naive Bayes (NB), and support vector machine (SVM) classifiers, UDT achieved accuracies of 96.25 %, 93.75 %, and 96.25 %, respectively, outperforming Foley-Sammon transform (FST) and discriminant principal component analysis (DPCA). The UDT + XGBoost combination achieved the highest classification accuracy of 100 %. For robust validation, a 5-fold cross-validation strategy was applied to the UDT + XGBoost model, achieving an average accuracy of 96.00 %. This study provides a reliable method for black bean origin traceability and authenticity.