Abstract
Contrastive learning methods can be powerful tools for genomics, enabling the identification of signals in an experiment via dimension reduction while reducing noise using a control. One such popular approach is contrastive PCA, which, despite being used in a variety of settings, does not scale to large datasets. We show that the contrastive PCA objective is an approximation of a Rayleigh quotient, analogous in form to Fisher's linear discriminant analysis and the common spatial patterns method. The Rayleigh quotient is ρPCA , satisfies numerous desirable properties, and provides an interpretable form of dimension reduction via generalized eigenvectors. We demonstrate that ρPCA is more accurate than contrastive PCA and much more efficient. We also show how it can be used not only for dimension reduction of data with respect to a control, but also for contrasting conditions via an analysis of single-nucleus transcriptomics data. Finally, we discuss probabilistic interpretations of ρPCA that provide further insight into its effective performance.