Abstract
Pediatric sarcomas present diagnostic challenges due to their rarity and diverse subtypes, often requiring specialized pathology expertise and costly genetic tests. To overcome these barriers, we developed a computational pipeline leveraging deep learning methods to accurately classify pediatric sarcoma subtypes from digitized histology slides. To ensure classifier generalizability and minimize center-specific artifacts, a dataset comprising 867 whole-slide images (WSI) from three medical centers and the Children's Oncology Group was collected and harmonized. Multiple convolutional neural network and vision transformer (ViT) architectures were systematically evaluated as feature extractors for SAMPLER-based WSI representations, and input parameters, such as tile size combinations and resolutions, were tested and optimized. The analysis showed that advanced ViT foundation models (UNI and CONCH) significantly outperformed earlier approaches, and incorporating multiscale features enhanced classification accuracy. The optimized models achieved high performance, distinguishing rhabdomyosarcoma (RMS) from non-RMS soft-tissue sarcomas (NRSTS) with an AUC of 0.969 and differentiating RMS subtypes (alveolar vs. embryonal) with an AUC of 0.961. Additionally, a two-stage pipeline effectively identified scarce Ewing sarcoma images from other NRSTS (AUC = 0.929). Compared with conventional transformer-encoder architectures used for WSI representations, these SAMPLER-based classifiers were three orders of magnitude faster to train, despite operating entirely without a graphical processing unit. This study highlights that digital histopathology paired with rigorous image harmonization provides a powerful solution for pediatric sarcoma classification. SIGNIFICANCE: An approach pairing a multi-institutional dataset with a published pipeline extends imaging-based diagnostic capabilities and creates the potential for accurate, rapid cancer diagnoses across resource-limited and remote settings. This article is part of a special series: Driving Cancer Discoveries with Computational Research, Data Science, and Machine Learning/AI .