Abstract
Salvia miltiorrhiza: is a widely used Chinese medicinal herb whose quality is significantly influenced by geographical origin. Establishing reliable methods for origin identification is therefore crucial for quality assurance. In this study, 67 batches of Salvia miltiorrhiza samples from Shandong, Shanxi, Henan, and Sichuan provinces were analyzed using near-infrared (NIR) and mid-infrared (MIR) spectroscopy combined with chemometric techniques. Six preprocessing methods were applied to optimize spectral data, and PLS-DA models were constructed based on the optimized results. To further improve model performance, uninformative variable elimination (UVE), competitive adaptive reweighted sampling (CARS), and random forest (RF) were employed for variable selection. Discriminant models were then established using NIR, MIR, and fused (NIR + MIR) data, with performance evaluated by accuracy. Results showed that in NIR, the 2nd-RF-PLS-DA model achieved the best performance with 96.72% accuracy, while in MIR, the SG-UVE-PLS-DA model reached 98.33% accuracy. After integrating NIR and MIR data, the 2nd-UVE-PLS-DA model achieved 100% accuracy, demonstrating the strongest discriminative capability. These findings demonstrate that combining NIR and MIR spectroscopy with appropriate preprocessing and variable selection strategies fully exploits complementary spectral information, enabling the construction of rapid, reliable, and efficient discriminant models. This approach provides an effective tool for origin tracing of Salvia miltiorrhiza and serves as a methodological reference for advancing quality evaluation of other Chinese herbal medicines.