Defining ground truth for prostate segmentation of transrectal ultrasound images: Inter- and intra-observer variability of manual versus semi-automatic methods.

阅读:4
作者:Lenfant Louis, Beitone Clément, Troccaz Jocelyne, Fiard Gaelle, Malavaud Bernard, Voros Sandrine, Mozer Pierre C
BACKGROUND: Accurate prostate segmentation in transrectal ultrasound (TRUS) imaging is essential for diagnosis, treatment planning, and developing artificial intelligence (AI) algorithms. Although manual segmentation is often recommended as the ground truth for AI training, it is time-consuming, prone to inter- and intra-observer variability, and rarely used in everyday clinical practice. Semi-automatic methods provide a faster alternative but lack thorough multi-operator evaluations. Understanding variability in segmentation methods is crucial to defining a reliable reference standard for future AI training. PURPOSE: To investigate the inter-individual variability in manual and semi-automatic prostate contour segmentation on 3D TRUS images and to compare both approaches to determine the most consistent method that could serve as a reference standard for future AI model development. METHODS: This study is a methodological investigation and not an AI study. Four urology experts independently performed manual and semi-automatic segmentation on 100 prostate 3D TRUS exams obtained from patients undergoing fusion prostate biopsy. Inter-individual and intra-individual variability for manual segmentation was assessed using the Average Surface Distance (ASD) between manually placed points and a reference mesh. Two methods were used to create the reference prostate mesh after manual point positioning: a statistical shape model (manual_SSM) and a deformable model (manual_soft-SSM). Semi-automatic segmentations were evaluated using ASD, Dice similarity coefficient, and Hausdorff distance. A Simultaneous Truth and Performance Level Estimation (STAPLE) like consensus method was applied to assess variability across experts in semi-automatic segmentation. Statistical comparisons used Wilcoxon tests, and effect sizes were calculated using Cohen's d. Bonferroni correction was applied for multiple comparisons. A significance level of p < 0.05 (adjusted as needed) was used. RESULTS: Manual segmentation inter-individual variability was higher with the manual_SSM method [ASD = 2.6 mm (Inter Quartile Range (IQR) 2.3-3.0)] compared to the manual_soft-SSM [ASD = 1.5 mm (IQR 1.2-1.8), P < 0.001]. Intra-individual variability also showed lower ASD values with manual_soft-SSM compared to manual_SSM, [(1.0 (0.8-1.1) versus 2.2 (1.9-2.6), p < 0.001], respectively. For semi-automatic segmentation, inter-individual variability yielded an ASD of 1.4 mm (IQR 1.1-1.9), Dice of 0.90 (IQR 0.88-0.92), and Hausdorff distance of 5.7 mm (IQR 4.47-7.36). Manual and semi-automatic segmentation comparisons demonstrated an ASD of 1.43 mm (IQR 1.20-1.90). CONCLUSIONS: The semi-automatic segmentation method evaluated in this study demonstrated comparable accuracy to manual segmentation while reducing inter- and intra-individual variability. These findings suggest that the tested semi-automatic approach can serve as a reliable reference standard for AI training in prostate segmentation.

特别声明

1、本文转载旨在传播信息,不代表本网站观点,亦不对其内容的真实性承担责任。

2、其他媒体、网站或个人若从本网站转载使用,必须保留本网站注明的“来源”,并自行承担包括版权在内的相关法律责任。

3、如作者不希望本文被转载,或需洽谈转载稿费等事宜,请及时与本网站联系。

4、此外,如需投稿,也可通过邮箱info@biocloudy.com与我们取得联系。