Abstract
Rhubarb is widely used in food, medicine, and industry. As wild supplies decline, cultivated rhubarb is increasingly used instead. However, its quality varies by region, so systematic evaluation is needed. This study measured five active compounds in 235 wild R. tanguticum samples from 46 sites. The results demonstrate that the machine learning models outperformed the linear model, exhibiting lower root mean square error (RMSE: MLM, 0.75; RF, 0.57; XGB, 0.60; KNN, 0.59) and mean absolute error (MAE: MLM, 0.59; RF, 0.44; XGB, 0.47; KNN, 0.46), along with higher R² values (MLM, 0.23; RF, 0.56; XGB, 0.51; KNN, 0.53) for total anthraquinones. The Random Forest (RF) model was selected for final predictions, showing that Xining and its surrounding areas exhibit the highest contents of total anthraquinones (2.5~3.5%), sennoside A (0.4~1.2%), sennoside B (0.8~1.3%), and gallic acid (0.15~0.37%) in wild R. tanguticum. Field cultivation at four sites confirmed the model's accuracy. Integrating field sampling, model simulation, and cultivation validation, this study identifies optimal regions for high-quality R. tanguticum cultivation, thereby supporting the sustainable utilization and industrial development of rhubarb resources.