Abstract
Baijiu is a type of traditional Chinese alcoholic beverage with significant economic and cultural value. Ethanol concentration determination through machine learning-based Raman spectroscopy offers the advantages of being contact-free and rapid, and the technique holds considerable potential for baijiu quality control in the industrial manufacturing process. However, current applications of Raman spectroscopy for the quantitative analysis of biochemical materials are restricted by measurement accuracy, as well as the flexibility and robustness of chemometric tools. To address these issues, we propose a method that combines graph-regularized principal component analysis (graph-regularized PCA) and an ensemble learning framework, random forest, to capture effective low-dimensional representations from high-dimensional Raman spectra data while reducing spectra data instability. Furthermore, we propose a protocol that adopts ethanol solutions with various concentrations as the training set for fitting a single regression model to determine the ethanol concentrations of different types of baijiu. In ethanol concentration detection across all three types of baijiu, our proposed method achieves a mean average percentage error (MAPE) of 0.415% on ethanol concentration determination of all three types of baijiu, outperforming all other methods. The results validate the accuracy and robustness of our proposed method.