Generalized Few-Shot MM-Former For Surgical Scene Panoptic Segmentation

用于手术场景全景分割的通用少镜头MM-Former

阅读:1

Abstract

Panoptic segmentation is crucial for surgical scene understanding but remains a significant challenge. This is particularly due to the high cost of annotation, which often results in class imbalance in existing datasets, leading to poor performance on categories with limited samples. To address it, we proposed a generalized few-shot MM-former, which is a three-stage framework: (1) We build surgical image-text pairs from the CholecT50 dataset. Using these data, we fine-tune the stable diffusion model to extract multi-scale, image-text fused representations. (2) We train an Mask2Former-based panoptic segmentation decoder on the base classes with sufficient samples, and use it to transform the representations of each image to a set of mask proposals with category predictions. (3) We propose an N-to-M mask matching method. Given a small set of samples from N novel classes, we extract their features as guidance to match M mask proposals, enabling identification of all novel class objects in a single pass. Specifically, each matched proposal is updated with the most likely novel class, while the others keep original predictions. Finally, all proposals are merged to output the results. On CholecPanSeg, our newly built surgical panoptic dataset, the method achieves outstanding results under limited data, surpassing previous approaches.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。