Abstract
This study proposes a novel liver and liver tumor segmentation model. The architecture integrates BiFormer into the bottom two layers of the Attention U-Net encoder to enhance global semantic context modeling and establish long-range pixel-wise dependencies. The proposed spatial-channel dual attention (SCDA) mechanism is incorporated into the first three encoder layers to refine the fine-grained feature processing capabilities, particularly for precise delineation of liver and tumor boundaries. Eventually, a Mix Structure Block (MSB) is implemented within the decoder to optimize fusion of deep semantic and shallow spatial features, thereby elevating segmentation accuracy. Ablation experiments were conducted on three publicly available datasets. On the 3Dircadb dataset, the mean dice coefficient achieved was 0.9377 and the mean IoU Index achieved was 0.8889. On the LITS dataset, the mean dice coefficient achieved was 0.9257 and the mean IoU Index achieved was 0.8704. On the CHAOS dataset, the mean dice coefficient achieved was 0.9611 and the mean IoU Index achieved was 0.9259. These results validate the functionality and effectiveness of the proposed network model. This study constructed a novel neural network based on attention mechanisms; by enabling precise and automated segmentation directly from raw sensor-acquired medical images, the proposed method enhances the diagnostic value of these imaging sensors, facilitating more accurate clinical decision-making.