Abstract
In recent years, the study of semantic segmentation of remote sensing images (RSI) has gained significant attention due to its critical role in geospatial analysis, agriculture, and forestry. However, existing remote sensing segmentation methods face several challenges: (1) limited dataset diversity and inadequate exploration of traditional village landscapes, resulting in a lack of geospatial representation for these unique environments; (2) inefficiencies in same-layer or cross-layer feature fusion when using convolutional neural networks (CNNs) or transformers, leading to either insufficient spatial modeling or excessive computational demands; and (3) multimodal approaches that improve modeling accuracy but introduce high parameter complexity and computational overhead. To address these issues, we propose the Mamba Prompt Learning Network (MPLNet) for efficient and accurate RSI segmentation, with a strong emphasis on spatial information extraction and GIS-based applications. First, we construct TV-RSI, a highly diverse large-scale data set specifically designed to capture the spatial structures, topographic variations, and land use patterns of traditional villages. Second, we develop the Mamba Fusion Module, which improves geospatial feature utilization by efficiently modeling both intralayer and interlayer spatial relationships, ensuring comprehensive feature extraction. Finally, we introduce prompt learning, which transfers bimodal geospatial knowledge from heavy-weight networks into a lightweight unimodal model, improving segmentation accuracy while maintaining computational efficiency. Extensive experiments on TV-RSI and two publicly available RSI datasets demonstrate that MPLNet achieves state-of-the-art performance with significantly reduced computational costs, making it an ideal solution for geospatial segmentation tasks in GIS-driven remote sensing applications.