Abstract
We propose HERGAST, a system for spatial structure identification and signal amplification in ultra-large-scale and ultra-high-resolution spatial transcriptomics data. To handle ultra-large spatial transcriptomics (ST) data, we consider the divide and conquer strategy and devise a Divide-Iterate-Conquer framework especially for spatial transcriptomics data analysis, which can also be adopted by other computational methods for extending to ultra-large-scale ST data analysis. To tackle the potential over-smoothing problem arising from data splitting, we construct a heterogeneous graph network to incorporate both local and global spatial relationships. In simulations, HERGAST consistently outperforms other methods across all settings with more than a 10% increase in average adjusted rand index (ARI). In real-world datasets, HERGAST's high-precision spatial clustering identifies SPP1+ macrophages intermingled within colorectal tumors, while the enhanced gene expression signals reveal unique spatial expression patterns of key genes in breast cancer.