Abstract
PURPOSE: This study aimed to predict the cell types that infiltrate the tumor microenvironment using hematoxylin and eosin-stained images from colon cancer and breast cancer samples. METHODS: Two datasets, one focused on colon cancer and the other on breast cancer, were used to develop deep learning models. Cell segmentation was performed using Stardist, followed by the K-Nearest Neighbor method to construct a neighborhood-enhanced cellular extraction matrix for model training. Transductive semi-supervised learning was applied to the breast cancer dataset, where the Base-4 model was trained on S1 and S2 samples and subsequently used to generate assigned labels for the S3, S4, and S5 sets, on which the Base-4+ model was trained. RESULTS: The Base-7 model trained on colon cancer cell images achieved accuracy of 0.85 on the hold-out test set and 0.74- on the independent test set, with six neighboring cells identified as the optimal condition for prediction. In addition, the Base-4 model achieved a prediction accuracy of 0.69 with four neighboring cells as the optimal condition in the breast cancer dataset, while the Base-4+ model reached an accuracy of up to 0.93 on the validation set. The model also captured invasive and ductal carcinoma cells with overall agreement relative to spot-based cell types (0.63). CONCLUSIONS: Deep learning models accurately predicted cell types in breast and colon cancer datasets using only cell morphology and neighborhood embedding.