Abstract
The rapid advancement of single-cell omics technologies such as single-cell RNA sequencing and single-cell assay for transposase-accessible chromatin with high throughput sequencing has transformed our understanding of cellular heterogeneity and regulatory mechanisms. However, integrating these data types remains challenging due to distributional discrepancies and distinct feature spaces. To address this, we present a novel single-cell Contrastive INtegration framework (sCIN) that integrates different omics modalities into a shared low-dimensional latent space. sCIN uses modality-specific encoders and contrastive learning to generate latent representations for each modality, aligning cells across modalities and removing technology-specific biases. The framework was designed to rigorously prevent data leakage between training and testing, and was extensively evaluated on three real-world paired datasets namely simultaneous high-throughput ATAC and RNA expression with sequencing, 10X PBMC (10k version), and cellular indexing of transcriptomes and epitopes, and one unpaired dataset of gene expression and chromatin accessibility. Paired datasets refer to multi-omics data generated using technologies capable of capturing different omics features from the same cell population while unpaired datasets are measured from different cell populations from a tissue. Results on paired and unpaired datasets show that sCIN outperforms state-of-the-art models, including scGLUE, scBridge, sciCAN, Con-AAE, Harmony, and MOFA+, across multiple metrics: average silhouette width for clustering quality, Recall@k, cell type@k, cell type accuracy, and median rank for integration quality. Moreover, sCIN was evaluated on simulated unpaired datasets derived from paired data, demonstrating its ability to leverage available biological information for effective multimodal integration. In summary, sCIN reliably integrates omics modalities while preserving biological meaning in both paired and unpaired settings.