Abstract
BACKGROUND: Advancements in digital pathology and computer technology have spurred artificial intelligence in histopathology, but the complexity of whole slide images (WSIs) poses challenges for manual annotation and traditional supervised learning. METHODS: We propose the Sample-Positive (SP) technique, which utilizes adjacent tissue morphology in WSIs to effectively sample positive examples. By integrating pathological prior information that reflects spatial adjacency of similar tissues with self-supervised learning (SSL) frameworks like SimCLR, MoCo-v3, and SinCLR, we developed an SSL method for WSI. We validated this approach on a dataset of 65 lung squamous cell carcinoma (LSCC) cases, covering four histological categories: necrosis, tumor, stroma, and epithelium. Performance was benchmarked against supervised models and original SSL frameworks using fine-tuning and linear evaluation, with metrics including accuracy (Acc), AUC, and F1 score. RESULTS: Our proposed SP technique outperformed baseline SSL methods in fine-tuning and linear evaluation tasks on the LSCC dataset. SPSimCLR and SPMoCo-v3 achieved the highest F1 scores, with SPSimCLR (0.9132) showing a 0.7% improvement over SimCLR (0.9067) and SPMoCo-v3 (0.9133) a 0.5% improvement over MoCo-v3 (0.9088) in fine-tuning, and SinCLR (0.9074) performs comparably to the original SSL methods. In linear evaluation, SPSimCLR (0.9082) improved F1 scores by 1.0% over SimCLR (0.8978), and SPMoCo-v3 (0.9060) improved by 1.2% over MoCo-v3 (0.8942), and SinCLR(0.9021) is surpass the original SSL methods. Ablation studies revealed that overlapping sampling slightly outperformed non-overlapping sampling, and that models trained on patches with single tissue types performed better than those trained on patches containing multiple tissue types. CONCLUSIONS: Overall, combining the SP technique with contrastive learning shows significant improvements in distinguishing histological categories in LSCC, making it effective for WSIs of non-diffuse cancers.