Abstract
OBJECTIVE: This study conducted an unsupervised learning cluster analysis on urine cytological images of high-grade urothelial carcinoma to assess their explanatory potential. MATERIALS AND METHODS: A total of 124 urine cytology specimens of urothelial carcinoma, collected between December 2010 to December 2021 at Gunma University Hospital, were analyzed. Ten cytological image fields per specimen were captured, and pathological T factors were examined using principal component analysis and t-distributed stochastic neighbor embedding (t-SNE) with machine learning (ML) software. Common image features were also verbalized and manually reevaluated. RESULTS: In the t-SNE analysis, the T1-dominant region was characterized by "few cells in the background," whereas the T2-dominant region showed "many cells in the image," "numerous neutrophils in the image," and "abundant tumor cells in the image." Human reassessment identified significant differences related to muscle invasion status for all findings except "abundant tumor cells in the image." Furthermore, we confirmed that histological neutrophil infiltration was related to the abundance of neutrophils in the cytological specimens. CONCLUSION: This study is noteworthy as the cluster analysis identified previously unreported variations in background cell types and quality linked to muscle invasion status, and it also demonstrated the explainability of ML-derived findings through manual reassessment.