Abstract
Multi-omics characterization of individual cells offers remarkable potential for analyzing the dynamics and relationships of gene regulatory states across millions of cells. How to integrate multimodal data is an open problem, existing integration methods struggle with accuracy and modality-specific biological variation retention. In this paper, we present scHyper (scalable, interpretable machine learning for single cell integration), a low-code and data-efficient deep transfer model designed for integrating paired and unpaired single-cell multimodal data. We benchmark scHyper against datasets from different multimodal data. ScHyper learns a low-dimensional representation and aligns the covariance matrices of the measured modalities, achieving high accuracy even with large scale atlas-level datasets with low memory and computational time across different cell lines, shedding light on regulatory relationships between different types of omics. Altogether, we show that scHyper is a versatile and robust tool for cell-type label transfer and integration from multimodal single-cell datasets.