Abstract
Existing semantic segmentation methods for road extraction in remote sensing imagery often struggle with limited long-range dependency modeling and semantic redundancy in feature fusion, further compounded by training instability caused by standard activation functions like ReLU. To address these challenges, we propose DS-Unet, a novel architecture that fundamentally reconstructs feature fusion and activation paradigms. It integrates two core innovations: 1) The Complementary Attention Fusion Module (CAFM) replaces standard skip connections to dynamically balance feature distinctiveness (SDA) for false positive suppression and global context (GCA) for connectivity enhancement. 2) The SUGAR activation function introduces smooth surrogate gradients to resolve the 'dying neuron' issue, thereby boosting training stability and fine-grained feature expression. Extensive experiments against 17 state-of-the-art methods validate DS-Unet's superiority, achieving new benchmarks of 75.08% IoU (84.25% F1) on the Massachusetts Road Dataset and 79.25% IoU (87.21% F1) on the DeepGlobe Road Dataset. These results establish DS-Unet as a robust solution for high-precision road extraction.