Multi-scale fusion semantic enhancement network for medical image segmentation

用于医学图像分割的多尺度融合语义增强网络

阅读:1

Abstract

The application of sophisticated computer vision techniques for medical image segmentation (MIS) plays a vital role in clinical diagnosis and treatment. Although Transformer-based models are effective at capturing global context, they are often ineffective at dealing with local feature dependencies. In order to improve this problem, we design a Multi-scale Fusion and Semantic Enhancement Network (MFSE-Net) for endoscopic image segmentation, which aims to capture global information and enhance detailed information. MFSE-Net uses a dual encoder architecture, with PVTv2 as the primary encoder to capture global features and CNNs as the secondary encoder to capture local details. The main encoder includes the LGDA (Large-kernel Grouped Deformable Attention) module for filtering noise and enhancing the semantic extraction of the four hierarchical features. The auxiliary encoder leverages the MLCF (Multi-Layered Cross-attention Fusion) module to integrate high-level semantic data from the deep CNN with fine spatial details from the shallow layers, enhancing the precision of boundaries and positioning. On the decoder side, we have introduced the PSE (Parallel Semantic Enhancement) module, which embeds the boundary and position information of the secondary encoder into the output characteristics of the backbone network. In the multi-scale decoding process, we also add SAM (Scale Aware Module) to recover global semantic information and offset for the loss of boundary details. Extensive experiments have shown that MFSE-Net overwhelmingly outperforms SOTA on the renal tumor and polyp datasets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。