Abstract
The Segment Anything Model (SAM), a foundational vision model, struggles with fully automatic segmentation of specific objects. Its "segment everything" mode, reliant on a grid-based prompt strategy, suffers from localization blindness and computational redundancy, leading to poor performance on tasks like Dichotomous Image Segmentation (DIS). To address this, we propose PMG-SAM, a framework that introduces a Pre-Mask Guided paradigm for automatic targeted segmentation. Our method employs a dual-branch encoder to generate a coarse global Pre-Mask, which then acts as a dense internal prompt to guide the segmentation decoder. A key component, our proposed Dense Residual Fusion Module (DRFM), iteratively co-refines multi-scale features to significantly enhance the Pre-Mask's quality. Extensive experiments on challenging DIS and Camouflaged Object Segmentation (COS) tasks validate our approach. On the DIS-TE2 benchmark, PMG-SAM boosts the maximal F-measure from SAM's 0.283 to 0.815. Notably, our fully automatic model's performance surpasses even the ground-truth bounding box prompted modes of SAM and SAM2, while using only 22.9 M trainable parameters (58.8% of SAM2-Tiny). PMG-SAM thus presents an efficient and accurate paradigm for resolving the localization bottleneck of large vision models in prompt-free scenarios.