Transfer learning drives automatic HER2 scoring on HE-stained WSIs for breast cancer: a multi-cohort study

迁移学习驱动乳腺癌HE染色全切片图像HER2自动评分:一项多队列研究

阅读:1

Abstract

BACKGROUND: Streamlining the clinical procedure of human epidermal growth factor receptor 2 (HER2) examination is challenging. Previous studies neglected the intra-class variability within both HER2-positive and -negative groups and lacked multi-cohort validation. To address this deficiency, this study collected data from multiple cohorts to develop a robust model for HER2 scoring utilizing only Hematoxylin&Eosin-stained whole slide images (WSIs). METHODS: A total of 578 WSIs were collected from five cohorts, including three public and two private datasets. Each WSI underwent adaptive scale cropping. The transfer-learning-based probabilistic aggregation (TL-PA) model and multi-instance learning (MIL)-based models were compared, both of which were trained on Cohort A and validated on Cohorts B-D. The model demonstrating superior performance was further evaluated in the neoadjuvant therapy (NAT) cohort. Scoring performance was assessed using the area under the receiver operating characteristic curve (AUC). Correlation between the model scores and specific grades (HER2 levels, pathological complete response (pCR) status, residual cancer burden (RCB) grades) were evaluated using Spearman rank correlation and Dunn's test. Patch analysis was performed with manually defined features. RESULTS: For HER2 scoring, the TL-PA significantly outperformed the MIL-based models, achieving robust AUCs in four validation cohorts (Cohort A: 0.75, Cohort B: 0.75, Cohort C: 0.77, Cohort D: 0.77). Correlation analysis confirmed a moderate association between model scores and manual reader-defined HER2-IHC status (Coefficient((Spearman)) = 0.37, P((Spearman)) = 0.001) as well as RCB grades (Coefficient((Spearman)) = 0.45, P((Spearman)) = 0.0006). In Cohort NAT, with the non-pCR as the positive control, the AUC was 0.77. Patch analysis revealed a core-to-peritumoral probability decrease pattern as malignancy spread outward from the lesion's core. CONCLUSION: TL-PA shows robust generalization for HER2 scoring with minimal data; however, it still inadequately capture intra-class variability. This indicates that future deep-learning endeavors should incorporate more detailed annotations to better align the model's focus with the reasoning of pathologists.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。