Abstract
As the third leading cause of cancer related deaths in the United States, pancreatic cancer (PaC) is a highly heterogenous malignancy that can be divided into a multitude of potential subtypes, with the main 4 consisting of aberrantly differentiated endocrine exocrine (ADEX), immunogenic, progenitor, and squamous. Each PaC subtype is characterized by their unique molecular pathways and therapeutic characteristics. Identifying PaC molecular subtypes is essential for downstream patient risk stratification and tailored treatment design. Conventional wet-lab approaches for PaC subtyping like microdissection, histopathological studies or molecular profiling are often laborious, costly and time-consuming. To address these concerns, we present MetaPaCS, a novel meta-learning framework to accurately identify PaC subtypes based on transcriptomics data only. Specifically, after preprocessing, the transcriptome-based feature vectors were classified by 10 base machine learning (ML) classifiers, whose prediction outputs were then combined with the initial preprocessed feature vectors to constitute a new set of ensemble feature vectors for a meta-learning model. Our meta-learning model could learn and leverage the diversity of different base classifiers to boost the prediction performance beyond any single ML model. Results based on 100 times ten-fold cross validation tests on benchmarking datasets demonstrated that MetaPaCS performed significantly better than existing state-of-the-art methods for PaC subtyping. In addition, our meta-learning model remarkably outperformed each individual base classifier, demonstrating that MetaPaCS could combine diverse results from multiple base classifiers to boost the ensemble performance. We believe that MetaPaCS is a promising tool for characterizing PaC subtypes and will have positive impacts on downstream risk stratification and personalized treatment design for PaC patients.