Abstract
Spatial transcriptomics, by capturing both gene expression and spatial information, holds great promise for unraveling the complex organization of tissues. In this study, we introduce SpaICL, an image-guided curriculum strategy-based graph contrastive learning framework for spatial transcriptomics clustering. SpaICL integrates gene expression, spatial coordinates, and histological image features to construct a low-dimensional latent representation that enhances the de-lineation of spatial functional domains. The model employs a complementary masking strategy and a shared graph neural network encoder to generate dual embeddings, while a dual cross-attention mechanism aligns local and global features across multiple modalities. Additionally, the curriculum learning module further facilitates the gradual integration of neighborhood information, effectively mitigating the over-smoothing issues associated with fixed adjacency matrices. We evaluated the performance of SpaICL on five benchmark spatial transcriptomics datasets, achieving superior results compared to existing baseline methods. Moreover, SpaICL demonstrates significant potential in downstream analytical applications. The code of SpaICL is available at https://github.com/wenwenmin/SpaICL.