Abstract
Object detection in challenging imaging domains like security screening, medical analysis, and satellite imaging is often hindered by signal degradation (e.g., noise, blur) and spatial ambiguity (e.g., occlusion, extreme scale variation). We argue that many standard architectures fail by fusing multi-scale features prematurely, which amplifies noise. This paper introduces the Enhance-Fuse-Align (E-F-A) principle: a new architectural blueprint positing that robust feature enhancement and explicit spatial alignment are necessary preconditions for effective feature fusion. We implement this blueprint in a model named SecureDet, which instantiates each stage: (1) an RFCBAMConv module for feature Enhancement; (2) a BiFPN for weighted Fusion; (3) ECFA and ASFA modules for contextual and spatial Alignment. To validate the E-F-A blueprint, we apply SecureDet to the highly challenging task of X-ray contraband detection. Extensive experiments and ablation studies demonstrate that the mandated E-F-A sequence is critical to performance, significantly outperforming both the baseline and incomplete or improperly ordered architectures. In practice, enhancement is applied prior to fusion to attenuate noise and blur that would otherwise be amplified by cross-scale aggregation, and final alignment corrects mis-registrations to avoid sampling extraneous signals from occluding materials.