Abstract
Detecting abandoned or small-scale objects under adverse weather conditions remains a significant challenge for maintaining road safety and urban security in smart cities. Existing detection systems often suffer performance degradation in the presence of fog, rain, snow, or sand, where reduced visibility and background noise obscure object boundaries. To address this limitation, this study proposes a Self-Attention Driven Multi-Scale Object Detection Framework, which establishes a tightly integrated pipeline combining the Adaptive Dual-Background Model (ADBM), the Pixel-based Finite State Machine (PFSM), and the Attention-based Scale Module (ASM) with the Self-Attention Optimized You Only Look Once (SAO-YOLO) network. Unlike conventional YOLO-based approaches that treat enhancement modules independently, the proposed framework enables holistic interaction among background refinement, feature selection, and attention-guided detection, leading to more stable and context-aware predictions. Furthermore, a thermal-visual feature fusion mechanism aligns thermal and visual representations at the P3 level of the Feature Pyramid Network (FPN), enhancing visibility robustness under poor illumination. Experimental evaluation on the Detection in Adverse Weather Nature dataset demonstrates superior performance, achieving 97.25% Accuracy, 95.33% Precision, 96.37% Recall, and 96.55% F1-score, outperforming recent state-of-the-art models such as YOLOv5, YOLOv8, and DEtection TRansformer. These results validate the effectiveness of the integrated ADBM-PFSM-ASM-SAO-YOLO design in maintaining high detection reliability under complex weather variations. The proposed model offers a practical solution for real-time traffic monitoring, urban surveillance, and public safety applications where detection consistency under environmental uncertainty is critical.