Abstract
Precise segmentation of glands in histopathological images is essential for the diagnosis of colorectal cancer, as the changes in gland morphology are associated with pathological progression. Conventional computer-assisted methods rely on dense pixel-level annotations, which are costly and labor-intensive to obtain. The present study proposes a two-stage weakly supervised segmentation framework named Multi-Level Attention and Affinity (MAA). The MAA framework utilizes the image-level labels and combines the Multi-Level Attention Fusion (MAF) and Affinity Refinement (AR) modules. The MAF module extracts the hierarchical features from multiple transformer layers to grasp global semantic context, and generates more comprehensive initial class activation maps. By modeling inter-pixel semantic consistency, the AR module refines pseudo-labels, which can sharpen the boundary delineation and reduce label noise. The experiments on the GlaS dataset showed that the proposed MAA framework achieves the Intersection over Union (IoU) of 81.99% and Dice coefficient of 90.10%, which outperformed the state-of-the-art Online Easy Example Mining (OEEM) method with an improvement of 4.43% in IoU. Such experimental results demonstrated the effectiveness of integrating hierarchical attention mechanisms with affinity-guided refinement for annotation-efficient and robust gland segmentation.