Evaluation and failure analysis of four commercial deep learning-based autosegmentation software for abdominal organs at risk

对四款商用基于深度学习的腹部危险器官自动分割软件进行评估和失效分析

阅读:2

Abstract

PURPOSE: Deep learning-based segmentation of organs-at-risk (OAR) is emerging to become mainstream in clinical practice because of the superior performance over atlas and model-based autocontouring methods. While several commercial deep learning-based autosegmentation solutions are now available, the implementation of these tools is still at such a primitive stage that acceptance criteria are underdeveloped due to a lack of knowledge about the systems' segmentation tendencies and failure modes. As the starting point of the iterative process of clinical implementation, this study focuses on the outlier analysis of four commercial autocontouring tools for the abdominal OARs. MATERIALS AND METHODS: The autosegmentation software, developed by Limbus AI, MIM Contour ProtégéAI, Radformation AutoContour, and Siemens syngo.via, were used to segment 111 patient cases. Geometric segmentation accuracy was quantitatively compared with clinical contours using the dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95). The outliers from quantitative evaluations of each software were analyzed for the liver, stomach, and kidneys with the possible causes of outliers summarized into six categories: (1) difference in contouring style or guideline, (2) image acquisition and quality, (3) abnormal anatomy of the OAR, (4) abnormal anatomy of abutting organs/tissues, (5) external/internal devices, and (6) other causes. RESULTS: For the liver segmentation, the most prominent cause of discrepancies for Limbus, which occurred in four of its six outliers, was the existence of biliary stent or internal/external biliary drain as well as the resulting pneumobilia. Siemens included the abutting organs that shared CT numbers similar to those of the liver in 5/8 outliers. 12 of 13 Radformation's liver segmentation outliers included the heart and/or stomach while MIM not only included the stomach in the presence of barium in 5/11 outliers, but also produced fragmented contours in 5/11 other cases. Only Limbus and Radformation provided stomach segmentation, and imaging with barium contrast directly caused incomplete stomach delineation in 10/12 Limbus outliers and 21/25 Radformation outliers. As for the kidneys, Radformation and Siemens consistently followed the RTOG contouring guidelines, whereas the institutional contours excluded the renal pelvis in some cases, resulting in 19/25 Radformation outliers and 18/23 Siemens outliers. By contrast, Limbus contours appeared to follow different contouring guidelines that exclude the renal pelvis. Fragmented kidney contours were found in 10/15 Limbus outliers and 25/26 MIM outliers. The ones in MIM were directly linked to the use of IV contrast in imaging, but there was not enough evidence to identify the origin of Limbus's fragmented contours. CONCLUSION: The causes of the segmentation outliers of the four commercial deep learning-based autocontouring solutions were summarized for each OAR. This work can help the vendors improve their autosegmentation software and also inform the users of potential modes of failure when using the tools.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。