Abstract
BACKGROUND: Accurate segmentation of liver tumors on contrast-enhanced CT is essential for clinical decision-making, but remains challenging due to irregular tumor boundaries and the difficulty of distinguishing lesions from blood vessels and bile ducts. Although 3D convolutional networks effectively capture inter-slice contextual information, they are computationally intensive and memory-demanding. In contrast, 2D networks are efficient but limited by their inability to model volumetric context, often resulting in discontinuous or inaccurate segmentations. 2.5D approaches that stack adjacent slices offer a compromise but suffer from early fusion of information that weakens spatial discrimination. PURPOSE: We propose Mixed U-Net, a hybrid segmentation architecture designed to extract fine-grained z-axis features while maintaining low computational cost. METHODS: We embed a small number of 3D convolutional layers into a 2D convolutional U-Net using residual and skip connections. This allows the 3D convolutions to perform fine-grained spatial feature extraction at multiple depths in the network, effectively simulating 3D segmentation within a lightweight framework. Mixed U-Net was trained and evaluated on a dataset of 532 liver tumor cases from Hospital A, and externally validated on 45 cases from Hospital B. RESULTS: Mixed U-Net achieved a Dice score of 81.54% (95% CI: 81.45%-81.62%) on the internal test set, outperforming multiple 2D, 3D, and 2.5D baselines. On the external dataset, it maintained strong performance with a Dice of 78.92%, exceeding the baseline by 4.67%, demonstrating superior generalization. CONCLUSIONS: By integrating 3D feature extraction into a primarily 2D architecture, Mixed U-Net balances contextual accuracy with computational efficiency, making it well-suited for clinically applicable liver tumor segmentation.