Abstract
BACKGROUND: Accurate identification of bone structures helps surgeons to better locate target areas during the procedure and reduce damage to surrounding tissue. To efficiently automate the laborious task of labeling vertebrae in computed tomography (CT) images, we propose a DBU-Net deep learning segmentation network built upon the nnUnet framework. METHODS: Specifically, DBU-Net incorporates the multi-scale feature channel attention module and the dual-branch decoder architecture. The multi-scale feature channel attention module effectively combines image features from adjacent stages, integrating detailed and structural information from different scales. By applying a weighted operation, it adaptively adjusts the importance of each channel, enabling precise extraction of multi-scale features. The dual-branch decoder structure, built upon the nnUNet decoder, adds a branch that incorporates a contextual Transformer module to capture global contextual information. At each decoding stage, the features from the two branches interact, seamlessly merging global context with local features. This significantly enhances the network's ability to process complex features in spinal CT images, improving segmentation accuracy. RESULTS: We conducted a comprehensive evaluation using the Vertebrae Segmentation (VerSe) dataset from Medical Image Computing and Computer Assisted Intervention (MICCAI) 2019 and 2020. The experimental results demonstrate that DBU-Net achieves state-of-the-art performance with an average Dice coefficient of 94.59%. CONCLUSIONS: This highlights its significant potential to assist surgeons in clearly and accurately identifying and locating spinal structures, and it is anticipated to provide robust technical support for the precise execution of spine-related surgeries and the effective diagnosis of diseases.