Abstract
Integrative analysis of multi-omics data provides a more comprehensive and nuanced view of a subject's biological state. However, high-dimensionality and ubiquitous modality missingness present significant analytical challenges. Existing methods for incomplete multi-omics data are scarce, do not fully leverage both modality-specific and shared information, and produce task-biased representations. We propose JASMINE, a self-supervised representation learning method for incomplete multi-omics data that preserves both modality-specific and joint information and enhances sample similarity structure. JASMINE produces embeddings that achieve superior performance across multiple tasks for two different incomplete multi-omics datasets while requiring only a single round of training per dataset.