Abstract
Despite significant breakthroughs in deep learning, brain tumor segmentation remains a challenging task due to the unclear tumor borders and the high degree of accuracy required. To overcome these concerns, we propose a new segmentation model, SwinCLNet, which integrates window-based multi-head self-attention, shifted window multi-head self-attention, cross-scale dual fusion, and residual large-kernel attention into the 3D U-Net architecture. First, the encoder employs the window-based multi-head and shifted window multi-head self-attention modules to capture rich contextual information. Second, the decoder employs the cross-scale dual fusion module, which precisely complements tumor boundary representation by fusing these enhanced features. Third, the SwinCLNet employs the residual large-kernel attention module over skip connections, using large-kernel attention to expand the receptive field and capture long-range spatial dependencies. Testing using the BraTS2023 and 2024 datasets demonstrated that the proposed SwinCLNet model has excellent performance in terms of the Dice score and Hausdorff distance for all brain tumor segmentation areas. In particular, the proposed model increased the average Dice score by approximately 4.53% and reduced the Hausdorff distance 95th percentile by approximately 30.89% compared with the average of benchmark models. These data demonstrate that the SwinCLNet model is particularly efficient in the difficult tumor core and enhancing tumor regions.