Abstract
This study aims to develop a rapid and non-destructive method for determining protein content in maize using near-infrared spectroscopy (NIRS). To mitigate the effects of surface irregularities and uneven protein distribution in whole kernels on spectral measurements, maize powder was used as the test material to enhance the uniformity and stability of spectral signals. A total of 90 maize powder samples were collected from major production regions across China, and a custom NIRS acquisition system was constructed. To optimize the spectral data, eight preprocessing methods-including Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), First Derivative (1D), Savitzky-Golay smoothing (S-G), and their combinations-were systematically evaluated. Subsequently, traditional machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM) and deep learning models (ResNet-18, Transformer) were developed to predict protein content, and their performances were compared. Results indicated that the combined preprocessing strategy of First Derivative and Multiplicative Scatter Correction (1D + MSC) was the most effective. Among the models, PLSR demonstrated the best predictive performance, and traditional chemometric methods showed greater practical utility compared to deep learning models. To further enhance model efficiency, four feature wavelength selection methods-Partial Least Squares Regression Coefficients (PLSRC), Competitive Adaptive Reweighted Sampling (CARS), Successive Projections Algorithm (SPA), and Uninformative Variable Elimination (UVE)-were applied. It was found that the PLSR model combined with the Successive Projections Algorithm (SPA) yielded the optimal performance, achieving a validation set correlation coefficient (R (p)) of 0.927, a root mean square error of prediction (RMSE(P)) of 0.301, and a residual predictive deviation (RPD) of 2.502, along with the fastest computational speed. This study provides a reliable technical solution and theoretical foundation for the rapid and non-destructive detection of protein content in maize, while also validating the advantage of using powdered samples in improving the accuracy of NIRS detection.