Abstract
INTRODUCTION: Lung cancer is the leading cause of cancer-related deaths worldwide, with accurate histologic subtype classification critical for diagnosis and treatment planning. Diagnostic variability and resource disparities, particularly in underrepresented regions such as Latin America, pose substantial challenges. This study developed and evaluated novel artificial intelligence models trained on both global and Latin American pathology samples for subtype classification of hematoxylin and eosin (HE)-stained whole-slide images (WSIs). METHODS: Two DinoV2-based feature extractors, LungDino and OncoDino, trained on large data sets for task-specific and general pathology applications, were developed. The training data set consisted of 1308 HE-stained WSIs, including 412 adenocarcinomas, 323 squamous cell carcinomas, 41 small cell carcinomas, and 532 benign tissue samples, sourced from The Cancer Genome Atlas and an in-house Latin American data set. A ResNet model trained on ImageNet served as the baseline. Models were evaluated on 79 Latin American WSIs using receiver operating characteristic curves, and heatmaps were generated for tumor localization. RESULTS: The DinoV2-based models outperformed the ResNet baseline. LungDino achieved the highest overall performance, with area under the curves of 0.97 for adenocarcinoma and 0.96 for squamous cell carcinoma. OncoDino excelled in underrepresented categories, achieving an area under the curve of 0.99 for small cell carcinoma, demonstrating its generalizability. Both models generated interpretable heatmaps, with LungDino demonstrating precise tumor localization. In the subset of samples classified as poorly differentiated or undifferentiated in HE pathology reports, the DinoV2 models also maintained high classification performance. CONCLUSION: These findings underscore the effectiveness of task-specific and general feature extractors in delivering accurate, explainable results and address a gap in artificial intelligence-driven histopathology advancements, paving the way for future clinical applications.