Abstract
Single-cell RNA sequencing enables comprehensive analysis of cellular diversity across biological systems. While current batch correction methods can jointly define cell types across multiple conditions, individuals, or modalities, they typically require matching features or paired samples across datasets. Here, we present shared-private Variational Inference via Product of Experts with Supervision (spVIPES), a probabilistic framework that decomposes unpaired single-cell datasets with nonmatching features into shared and private components. spVIPES learns a probabilistic latent variable model that separates dataset-specific (private) from conserved (shared) cellular features across groups. We implement both supervised and unsupervised variants: the supervised version uses cell-type annotations to guide the Product of Experts, while the unsupervised version leverages optimal transport to identify cell correspondences without requiring labels. We evaluate the performance of spVIPES using simulated data and demonstrate its utility across 3 diverse biological scenarios: (a) cross-species comparisons, (b) regeneration following long and short acute kidney injury, and (c) interferon-β stimulation of peripheral blood mononuclear cells. spVIPES effectively disentangles dataset-specific and conserved cellular features while matching or exceeding state-of-the-art methods for batch correction. Furthermore, spVIPES' shared latent space enables more accurate cell-type identification across datasets with nonmatching features compared to existing methods. We implemented spVIPES using the scvi-tools framework and release it as an open-source software at https://github.com/nrclaudio/spVIPES.