Abstract
BACKGROUND: Accurate thigh muscle segmentation from magnetic resonance imaging (MRI) enables quantitative assessment of muscle health. Although manual segmentation is the gold standard, it is labor-intensive and variable, and existing automated/semi-automatic approaches remain limited by segmentation errors/user dependence, restricting scalability. Defining data requirements for robust automated segmentation therefore remains a critical unmet need. PURPOSE: To determine the number of annotated lower extremity MRI studies needed to train an accurate deep learning (DL) model for thigh muscle segmentation and to assess the effect of training size on agreement of downstream quantitative measures. MATERIALS AND METHODS: Lower extremity MR images were obtained from competitive athletes with anterior cruciate ligament injuries and professional-level football athletes scanned at a single site on a 3 T GE Premier system. Fourteen thigh muscles were segmented using semi-automatic propagation followed by manual correction to generate high-quality ground-truth assisted manual segmentations (Seg(M)). Thirteen DL models (nnU-Net) were trained with Seg(M) on increasing numbers of training subjects (N (train)) ranging from N (train) = 5 up to N (train) = 120, each evaluated on a fixed independent test set of 41 subjects. Automated segmentation (Seg(A)) performance was evaluated using standard geometric accuracy metrics (Dice similarity coefficient [DSC], relative volume difference [RVD], Hausdorff Distance [HD], HD95, and average symmetric surface distance [ASSD]). To determine whether Seg(A) would lead to meaningful quantitative MRI results, we also compared fat fraction and diffusion-tensor imaging measures extracted from Seg(A) to those derived from Seg(M). RESULTS: DL model training on N (train) = 20 subjects achieved high accuracy on the fixed test set (mean ± SD: DSC 0.94 ± 0.02; RVD 4.9% ± 5.2%; ASSD 0.8 ± 0.4 mm; HD95 3.2 ± 2.8 mm), with modest improvement at 50 subjects. CONCLUSION: Twenty annotated images were sufficient for clinically acceptable performance, supporting streamlined segmentation and quantitative reporting in athlete care and research.