Abstract
BACKGROUND: 3D medical image segmentation is a cornerstone for quantitative analysis and clinical decision-making in various modalities. However, acquiring high-quality voxel-level annotations is both time-consuming and labor-intensive. Semi-supervised learning (SSL) provides an appealing solution by effectively utilizing limited labeled data along with abundant unlabeled data to enhance segmentation performance under clinical data constraints. METHODS: We propose a foundation model-driven multi-view collaborative learning framework that exploits zero-shot capabilities of SAM-like foundation models to jointly learn from axial, sagittal, and coronal planes. A collaborative fusion module integrates complementary representations across views, enhancing 3D structural understanding and improving the performance with limited annotation cost. RESULTS: Extensive experiments on two evaluation datasets including MRI brain tumor segmentation and whole-body PET heart segmentation demonstrate that our proposed method consistently outperforms existing SAM-based semi-supervised approaches. The multi-view collaborative design not only refines boundary precision for organ and tumor delineation but also shows strong transferability across imaging modalities. CONCLUSION: This study presents a foundation model-driven, multi-view collaborative learning paradigm that efficiently advances semi-supervised 3D medical image segmentation, which provides a scalable and clinically meaningful solution that reduces annotation dependency while maintaining high segmentation accuracy across diverse medical imaging modalities.