Abstract
High-resolution remote sensing image segmentation plays a crucial role in fields such as environmental surveillance, disaster impact analysis, and spatial resource management, yet the pronounced variability within classes, intricate scene structures, and substantial computational burden of modern deep models often impede their practical use. To mitigate these limitations, this study introduces SAM2-ARAFNet, a segmentation framework derived from Segment Anything Model 2 (SAM2) and equipped with lightweight adapter modules for efficient parameter tuning, together with an Attention-Enhanced Residual Atrous Spatial Pyramid Pooling (ResASPP) component that enriches multi-scale semantic representation. For deployment on resource-limited platforms, a tailored distillation strategy is further employed to compress the fine-tuned SAM2 model into a compact student network based on EfficientNet_b0. Experiments conducted on the ISPRS Vaihingen and Potsdam benchmarks demonstrate clear performance gains: SAM2-ARAFNet attains mIoU values of 85.43% and 87.44%, exceeding widely used baselines such as PSPNet by 4.93% and 4.03%. In addition, the distilled student model reduces parameters by 97% (from 222.98 M to 6.68 M) while preserving more than 99% of the teacher network's accuracy, illustrating its capability to deliver high-quality segmentation with markedly improved computational efficiency, and confirming its suitability for edge-focused remote sensing scenarios.