A Comparison of CT-Based Pancreatic Segmentation Deep Learning Models

基于CT的胰腺分割深度学习模型比较

阅读:1

Abstract

RATIONALE AND OBJECTIVES: Pancreas segmentation accuracy at CT is critical for the identification of pancreatic pathologies and is essential for the development of imaging biomarkers. Our objective was to benchmark the performance of five high-performing pancreas segmentation models across multiple metrics stratified by scan and patient/pancreatic characteristics that may affect segmentation performance. MATERIALS AND METHODS: In this retrospective study, PubMed and ArXiv searches were conducted to identify pancreas segmentation models which were then evaluated on a set of annotated imaging datasets. Results (Dice score, Hausdorff distance [HD], average surface distance [ASD]) were stratified by contrast status and quartiles of peri-pancreatic attenuation (5 mm region around pancreas). Multivariate regression was performed to identify imaging characteristics and biomarkers (n = 9) that were significantly associated with Dice score. RESULTS: Five pancreas segmentation models were identified: Abdomen Atlas [AAUNet, AASwin, trained on 8448 scans], TotalSegmentator [TS, 1204 scans], nnUNetv1 [MSD-nnUNet, 282 scans], and a U-Net based model for predicting diabetes [DM-UNet, 427 scans]. These were evaluated on 352 CT scans (30 females, 25 males, 297 sex unknown; age 58 ± 7 years [ ± 1 SD], 327 age unknown) from 2000-2023. Overall, TS, AAUNet, and AASwin were the best performers, Dice= 80 ± 11%, 79 ± 16%, and 77 ± 18%, respectively (pairwise Sidak test not-significantly different). AASwin and MSD-nnUNet performed worse (for all metrics) on non-contrast scans (vs contrast, P < .001). The worst performer was DM-UNet (Dice=67 ± 16%). All algorithms except TS showed lower Dice scores with increasing peri-pancreatic attenuation (P < .01). Multivariate regression showed non-contrast scans, (P < .001; MSD-nnUNet), smaller pancreatic length (P = .005, MSD-nnUNet), and height (P = .003, DM-UNet) were associated with lower Dice scores. CONCLUSION: The convolutional neural network-based models trained on a diverse set of scans performed best (TS, AAUnet, and AASwin). TS performed equivalently to AAUnet and AASwin with only 13% of the training set size (8488 vs 1204 scans). Though trained on the same dataset, a transformer network (AASwin) had poorer performance on non-contrast scans whereas its convolutional network counterpart (AAUNet) did not. This study highlights how aggregate assessment metrics of pancreatic segmentation algorithms seen in other literature are not enough to capture differential performance across common patient and scanning characteristics in clinical populations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。