Abstract
Building detection in remote sensing imagery confronts three interdependent challenges including extreme scale variance under dense spatial distributions, orientation instability in off-nadir imagery, and semantic gaps during multi-scale feature fusion. Existing methods address these challenges in isolation, resulting in performance degradation when challenges co-occur. MAR-YOLO establishes an integrated rotated object detection framework extending YOLOv11-OBB through three synergistic innovations. The Multi-scale Feature Adaptive Selection (MFAS) module adaptively filters P2-P5 features through dual- domain weighting, enhancing small building perception while suppressing redundancy. The adapted Adaptive Feature Pyramid Network (AFPN) employs progressive fusion with scale-matched kernels and learned spatial weights, eliminating semantic inconsistencies inherent in direct multi-scale concatenation. The RepVGG-based Enhanced Rotated Detection Head (RRD-Head) applies branch-specialized structural reparameterization addressing angle regression instability. Validation on BONAI demonstrates 87.2% mAP50 and 65.3% mAP50-95, representing 2.9% and 2.6% improvements over YOLOv11s-OBB at 95 FPS. Cross-dataset experiments on DOTA, DIOR-R, and HRSC2016 confirm architectural robustness across diverse detection scenarios.