Abstract
Real-time object detection in adverse weather and low-light conditions is crucial for applications such as autonomous driving and intelligent surveillance. This paper presents MDAT-YOLO, a novel object detection framework designed to balance accuracy and efficiency in challenging environments. The model integrates multi-dimensional attention mechanisms and transformer-based enhancements to strengthen feature extraction and adaptability. It introduces two core modules: DWConv_O, an optimized depthwise separable convolution layer, and ODConv++, an omni-dimensional dynamic convolution module that enhances spatial, channel, and kernel-level interactions for improved feature selectivity and dynamic response. A lightweight C3 Transformer (C3TR) block further reduces computational overhead while maintaining strong representational capacity. MDAT-YOLO is evaluated on four benchmark datasets, including RTTS, VOC-Foggy, ExDark, and a custom foggy VOC-PASCAL subset, achieving accuracy improvements of 70.50%, 65.14%, 77.40%, and 49.00%, respectively. The model sustains real-time speeds up to 145 FPS, demonstrating robustness and practicality for real-world deployment under diverse environmental conditions.