Abstract
Accurate determination of Soil Organic Carbon (SOC), which is the foundation of soil health and safeguards ecological and food security, is crucial in local agricultural production. We aimed to investigate the influence of soil texture on hyperspectral models for predicting SOC content and to evaluate the role of different preprocessing methods and feature band selection algorithms in improving modeling efficiency. Laboratory-determined SOC content and hyperspectral reflectance data were obtained using soil samples from daylily cultivation areas in Yunzhou District, Datong City. Mathematical transformations, including Savitzky-Golay smoothing (SG), First Derivative (FD), Second Derivative (SD), Multiplicative Scatter Correction (MSC), and Standard Normal Variate (SNV), were applied to the spectral reflectance data. Feature bands extracted based on the successive projection algorithm (SPA) and Competitive Adaptive Reweighted Sampling (CARS) were used to establish SOC content inversion models employing four algorithms: partial least-squares regression (PLSR), Random Forest (RF), Backpropagation Neural Network (BP), and Convolutional Neural Network (CNN). The results indicate the following: (1) Preprocessing can effectively increase the correlation between the soil spectral reflectance process and SOC content. (2) SPA and CARS effectively screened the characteristic bands of SOC in daylily cultivated soil from the spectral curves. The SPA algorithm and CARS selected 4-11 and 9-122 bands, respectively, and both algorithms facilitated model construction. (3) Among all the constructed models, the FD-CARS-PLSR performed most prominently, with coefficients of determination (R(2)) for the training and validation sets reaching 0.93 and 0.83, respectively, demonstrating high model stability and reliability. (4) Incorporating soil texture as an auxiliary variable into the PLSR inversion model improved the inversion accuracy, with accuracy gains ranging between 0.01 and 0.05.