Abstract
Background/Objectives: Ultrasound imaging is widely employed to assess kidney health and diagnose renal diseases. Accurate segmentation of renal structures in ultrasound images plays a critical role in the diagnosis and treatment of related kidney diseases. However, challenges such as speckle noise and low contrast still hinder precise segmentation. Methods: In this work, we propose an encoder-decoder architecture, named MAT-UNet, which incorporates two distinct attention mechanisms to enhance segmentation accuracy. Specifically, the multi-convolution pixel-wise attention module utilizes the pixel-wise attention to enable the network to focus more effectively on important features at each stage. Furthermore, the triple-branch multi-head self-attention mechanism leverages the different convolution layers to obtain diverse receptive fields, capture global contextual information, compensate for the local receptive field limitations of convolution operations, and boost the segmentation performance. We evaluate the segmentation performance of the proposed MAT-UNet using the Open Kidney US Data Set (OKUD). Results: For renal capsule segmentation, MAT-UNet achieves a Dice Similarity Coefficient (DSC) of 93.83%, a 95% Hausdorff Distance (HD95) of 32.02 mm, an Average Surface Distance (ASD) of 9.80 mm, and an Intersection over Union (IOU) of 88.74%. Additionally, MAT-UNet achieves a DSC of 84.34%, HD95 of 35.79 mm, ASD of 11.17 mm, and IOU of 74.26% for central echo complex segmentation; a DSC of 66.34%, HD95 of 82.54 mm, ASD of 19.52 mm, and IOU of 51.78% for renal medulla segmentation; and a DSC of 58.93%, HD95 of 107.02 mm, ASD of 21.69 mm, and IOU of 43.61% for renal cortex segmentation. Conclusions: The experimental results demonstrate that our proposed MAT-UNet achieves superior performance in multiple renal structure segmentation in ultrasound images.