Abstract
The development of rapid and intelligent methods is urgently needed for wheat quality evaluation. Using the prediction of wet gluten content as a case study, this work systematically investigated the performance of various machine learning algorithms and their optimization for content prediction, based on hyperspectral data from the visible and near-infrared ranges of wheat grains and flour. The results revealed that the random forest regression (RFR) algorithm delivered the best predictive performance under two conditions: first, when applied directly to visible spectra; and second, when applied to fused visible and near-infrared spectral data. This held true for both grains and flour. Conversely, its direct application to NIR spectra alone yielded relatively worse performance. Following data optimization, the first-derivative (FD) visible spectra of wheat grains were smoothed using a Savitzky-Golay (SG) filter and subsequently used as input for the RFR model. This optimized approach achieved a coefficient of determination (r(2)) of 0.8579, a root mean square error (RMSE) of 0.0216, and a relative percent deviation (RPD) of 2.6978. Under the same conditions, for wheat flour, the corresponding values were 0.8383, 0.0231, and 2.5293, respectively. Similarly, for wheat flour, the RFR model was applied to the SG-filtered FD spectra derived from the fused visible and near-infrared data, yielding an r(2) of 0.8474, an RMSE of 0.0224, and an RPD of 2.6034. Under the same conditions, wheat grains yielded an r(2) of 0.8494, an RMSE of 0.0223, and an RPD of 2.6208. This efficient and rapid intelligent prediction scheme demonstrates considerable potential for the quality assessment and control of relevant food products.