Abstract
Recent advances in spatial multi-omics technologies have enabled high-resolution profiling of cellular heterogeneity while preserving spatial context, offering unprecedented opportunities to decipher tissue architecture and intercellular communication. Although existing spatial transcriptomics tools have been effective for single modal analysis, integrated interpretation of multi omics layers including spatial transcriptome, spatial proteome, and spatial epigenome remains limited due to modality specific technical biases and biological complexity. To address this, we present CoMo, a graph-based framework that synergizes multi-modal feature learning through cross attention mechanisms, coupled with dual optimization via neighbor-aware contrastive loss for cross-omics feature fusion and cluster-aware contrastive loss for spatially coherent domain identification. Evaluations on five spatial omics datasets demonstrate superior performance in spatial domain identification compared with state-of-the-art methods. CoMo provides a robust computational tool for multi-omics studies and supports comprehensive characterization of tissue through synergistic feature learning.