Abstract
BACKGROUND: Improving the concordance of human epidermal growth factor receptor 2 (HER2) examinations among laboratories remains a challenge. In this multi-laboratory study, we investigated the concordance of HER2 immunohistochemistry (IHC) examination through manual and artificial intelligence (AI)-assisted interpretation. METHODS: A tissue microarray (TMA) comprising 53 breast cancer samples was constructed and distributed to 35 participating laboratories. For each sample on every slide, IHC scores of 0, 1+, 2+, and 3+ were recorded. Subsequently, cases that failed to achieve complete agreement during manual interpretation were re-evaluated using an AI-assisted microscope. RESULTS: During manual interpretation, 14 out of 53 cases (14/53, 26.4%) demonstrated concordant results across all laboratories, including 13 IHC-0 cases and 1 IHC-3+ case. Notably, cases scored as 1+ in at least one laboratory exhibited a low overall percentage agreement (OPA) and Fleiss Kappa value. Among the 39 cases with non-concordant manual interpretation, 14 cases (14/39, 35.9%) achieved complete agreement through AI-assisted HER2 interpretation. In cases where manual interpretation discrepancies were restricted to scores of 0 and 1+, 69.6% (16/23) of the cases still showed differences between 0 and 1+ in AI-assisted HER2 interpretation. Disagreements between manual and AI-assisted interpretation occurred significantly more frequently in sections manually scored as 1+ compared to those scored as 0 (58.6% vs. 2.1%, P<0.001). CONCLUSIONS: The weakly staining phenotype leads to poor agreement in the manual interpretation of HER2 IHC-1+ breast cancers. AI-assisted HER2 interpretation offers a viable approach for multi-laboratory studies, effectively avoiding the subjective errors inherent in manual interpretation.