Label Noise in Pathological Segmentation Is Overlooked, Leading to Potential Overestimation of Artificial Intelligence

病理分割中的标签噪声被忽视,导致对人工智能的潜在高估

阅读:1

Abstract

Artificial intelligence (AI) has transformed medical imaging, notably in radiology and endoscopy. Semantic segmentation, a pixel-level technique crucial for delineating pathological features, has become pivotal in digital pathology. Pathology segmentation AI models are often trained using annotations generated by pathologists. Despite the meticulous care typically exercised, pathologist-generated annotations often contain label noise whose types and effects on model training remain underexplored. This study combined a survey of public datasets with the synthesis of artificial label noise to evaluate its effects on pathology segmentation models. Using publicly available datasets and a breast cancer semantic segmentation dataset, modules were developed to simulate four types of artificial label noise at varying intensity levels. These datasets were used to train deep learning models and their performance was evaluated. The results indicated that models were highly susceptible to overfitting label noise, particularly boundary-dependent noise, such as dilation and shrinkage. Discrepancies were identified between apparent performance scores obtained under real-world conditions and true performance scores derived using clean test data. This overestimation risk was most pronounced for datasets containing boundary-altering noise. Furthermore, random noise combinations further degraded generalization. This study underscores the critical importance of addressing label noise in pathology datasets. It is proposed that future efforts focus on developing standardized methods for quantifying and mitigating label noise, along with creating robust benchmarks using noise-inclusive datasets. Enhancing annotation quality and addressing label noise can improve the reliability and generalizability of AI in pathology, facilitating broader clinical adoption.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。