Abstract
Genetically engineered mouse models (GEMM) of cancer are useful for exploring the development and biological composition of human tumors. Single-cell RNA sequencing (scRNA-seq) provides a transcriptomic snapshot of cancer to explore the heterogeneity of cell states in an immunocompetent context. However, cross-species comparison often suffers from biological batch effect, and inherent differences between species decrease the signal of biological insights that can be gleaned from these models. In this study, we developed scVital, a computational tool that uses a variational autoencoder and discriminator to embed scRNA-seq data into a species-agnostic latent space to overcome batch effect and identify cell states shared between species. In addition, latent space similarity score was concurrently developed as a new metric to evaluate batch correction accuracy by leveraging prelabeled clusters for scoring instead of the current method of creating new clusters. Using latent space similarity for quantification, scVital performed comparably well relative to other deep-learning algorithms and rapidly integrated scRNA-seq data of normal tissues across species with high fidelity. When scVital was applied to pancreatic ductal adenocarcinoma or lung adenocarcinoma data from GEMMs and primary patient samples, scVital accurately aligned biologically similar cell states. In undifferentiated pleomorphic sarcoma, a test case with no a priori knowledge of cell state concordance between mouse and human, scVital identified a previously unknown cell state that persisted after chemotherapy and is shared by a GEMM and human patient-derived xenografts. These findings establish the utility of scVital in identifying conserved cell states across species to enhance the translational capabilities of mouse models. SIGNIFICANCE: ScVital is an algorithm for cross-species integration of single-cell data to identify common cell states and facilitate translation of tumor models towards therapeutic application. This article is part of a special series: Driving Cancer Discoveries with Computational Research, Data Science, and Machine Learning/AI .