Abstract
Predicting spatial gene expression from Hematoxylin and Eosin histology images offers a promising approach to significantly reduce the time and cost associated with gene expression sequencing, thereby facilitating a deeper understanding of tissue architecture and disease mechanisms. Achieving accurate gene expression prediction requires the extraction of highly refined features from pathological images; however, existing methods often struggle to effectively capture fine-grained local details and model gene-gene correlations. Moreover, in bimodal contrastive learning, dynamically and efficiently aligning heterogeneous modalities remains a critical challenge. To address these issues, we propose a novel method for predicting gene expression. First, we introduce a dense connective structure that enables efficient feature reuse, thereby enhancing the capturing and mining of local refinement features. Second, we leverage the state space models to uncover underlying patterns and capture dependencies within 1D gene expression data, enabling more accurate modeling of gene-gene correlations. Furthermore, we design the Residual Kolmogorov-Arnold Network (RKAN) that uses a learnable activation function to dynamically adjust bimodal mappings based on input characteristics. Through continuous parameter updates during contrastive training, RKAN progressively refines the alignment between modalities. Extensive experiments conducted on two publicly available datasets, GSE240429 and HER2+, demonstrate the effectiveness of our approach and its significant improvements over existing methods. Source codes are available at https://github.com/202324131016T/DANet.