Abstract
Existing remote sensing change detection methods often struggle to accurately capture the contours of complex change targets and subtle textural differences. This makes it difficult to effectively distinguish between the boundaries of change targets and the background. To address this challenge, we propose a novel method called spatial-frequency decoupling alignment encoding (SDA-Encoding), which is designed to fully leverage information from both the spatial and frequency domains. Specifically, we first use a Transformer encoder to extract bi-temporal features. Next, we apply wavelet transform to decouple these features into low-frequency and high-frequency components. In the multi-scale high-frequency interaction (MHI) module, we combine local spatial enhancement using spatial pyramid pooling with cross-scale dependency supplementation via the dual-domain alignment fusion (DAF) module. Meanwhile, in the position-aware low-frequency enhancement (PLE) module, spatial position sensitivity is restored using coordinate attention, and region-level contextual dependencies are captured through the selective fusion attention (SFA) module. Finally, the two frequency-domain branches are complementarily fused within the spatial domain to achieve unified detection of both fine-grained and structural changes. Experimental results on three benchmark datasets demonstrate the significant performance improvements of SDA-Encoding.