Abstract
Crop classification plays a vital role in acquiring the spatial distribution of agricultural crops, enhancing agricultural management efficiency, and ensuring food security. With the continuous advancement of remote sensing technologies, achieving efficient and accurate crop classification using remote sensing imagery has become a prominent research focus. Conventional approaches largely rely on empirical rules or single-feature selection (e.g., NDVI or VV) for temporal feature extraction, lacking systematic optimization of multimodal feature combinations from optical and radar data. To address this limitation, this study proposes a crop classification method based on feature-level fusion of multimodal remote sensing data, integrating the complementary advantages of optical and SAR imagery to overcome the temporal and spatial representation constraints of single-sensor observations. The study was conducted in Story County, Iowa, USA, focusing on the growth cycles of corn and soybean. Eight vegetation indices (including NDVI and NDRE) and five polarimetric features (VV and VH) were constructed and analyzed. Using a random forest algorithm to assess feature importance, NDVI+NDRE and VV+VH were identified as the optimal feature combinations. Subsequently, 16 scenes of optical imagery (Sentinel-2) and 30 scenes of radar imagery (Sentinel-1) were fused at the feature level to generate a multimodal temporal feature image with 46 channels. Using Cropland Data Layer (CDL) samples as reference data, a U-Net deep neural network was employed for refined crop classification and compared with single-modal results. Experimental results demonstrated that the fusion model outperforms single-modal approaches in classification accuracy, boundary delineation, and consistency, achieving training, validation, and test accuracies of 95.83%, 91.99%, and 90.81% respectively. Furthermore, consistent improvements were observed across evaluation metrics, including F1-score, precision, and recall.