Abstract
The Automatic Checkout (ACO) task aims to accurately generate complete shopping lists from checkout images. Severe product occlusions, numerous categories, and cluttered layouts impose high demands on detection models' robustness and generalization. To address these challenges, we propose the Edge-Embedded Multi-Feature Fusion Network (E2MF2Net), which jointly optimizes synthetic image generation and feature modeling. We introduce the Hierarchical Mask-Guided Composition (HMGC) strategy to select natural product poses based on mask compactness, incorporating geometric priors and occlusion tolerance to produce photorealistic, structurally coherent synthetic images. Mask-structure supervision further enhances boundary and spatial awareness. Architecturally, the Edge-Embedded Enhancement Module (E3) embeds salient structural cues to explicitly capture boundary details and facilitate cross-layer edge propagation, while the Multi-Feature Fusion Module (MFF) integrates multi-scale semantic cues, improving feature discriminability. Experiments on the RPC dataset demonstrate that E2MF2Net outperforms state-of-the-art methods, achieving checkout accuracy (cAcc) of 98.52%, 97.95%, 96.52%, and 97.62% on Easy, Medium, Hard, and Average mode, respectively. Notably, it improves by 3.63 percentage points in the heavily occluded Hard mode and exhibits strong robustness and adaptability in incremental learning and domain generalization scenarios.