Abstract
INTRODUCTION: Thyroid ultrasound is the primary imaging modality for nodule detection, but manual interpretation suffers from subjectivity and inefficiency due to speckle noise, low contrast, and operator dependence. Deep learning-based segmentation methods often overlook anatomical prior information, leading to suboptimal performance on atypical nodules and complex backgrounds. METHODS: We propose RTS-Net, a novel segmentation network that integrates a dual-path attention enhancement mechanism (combining spatial and channel attention) and a cascaded graph convolution decoding architecture to leverage multi-scale feature pyramid fusion. A deep supervision strategy is also employed to accelerate convergence. The model is trained and evaluated on the TN3K, DDTI, and a large-scale clinical dataset. RESULTS: Extensive experiments demonstrate that RTS-Net achieves superior performance on both in-distribution and cross-dataset settings. On the TN3K dataset, it attains 81.66% F1-score and 71.87% IoU; on the DDTI dataset, it achieves 71.10% F1-score and 60.09% IoU, outperforming state-of-the-art methods including UNet, DeepLabv3+, TransUNet, and recent foundation-model-based approaches. Ablation studies confirm the effectiveness of each proposed component. DISCUSSION: The proposed dual-path attention and graph convolution modules effectively enhance feature representation and boundary integrity, particularly for small nodules and blurred edges. While RTS-Net shows strong generalization, failure cases reveal challenges in heterogeneous backgrounds and acoustic artifacts, suggesting future integration with foundation models like SAM to further improve robustness.