Abstract
Accurate segmentation of polyp tissues in colonoscopic images is crucial for early colorectal cancer detection. Existing CNN-based approaches effectively capture local dependencies but struggle with long-range relations, while transformer-based methods excel in global context modeling yet often overlook fine contextual details. Hybrid CNN–transformer models attempt to combine both, but typically overfit to convolutional features, weakening attention mechanisms. To address these limitations, we propose a Hierarchical Contextual Information Aggregation Network (HCIA) for polyp segmentation. HCIA introduces an Interconnected Attention Module (IAM) that applies global attention to single-level features, enabling comprehensive cross-hierarchy information exchange. In parallel, a Hierarchical Aggregation Module (HAM) fuses adjacent feature levels to enhance local contextual representation. This dual refinement allows HCIA to jointly capture global and local dependencies, yielding more precise tissue boundaries. Extensive experiments across multiple polyp segmentation benchmarks demonstrate that HCIA achieves superior generalization and state-of-the-art accuracy, highlighting its potential for clinical applications.