Abstract
Accurate indoor localization for unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments, especially for small-object detection and under low-light conditions. We propose Robust Wavelet-Aware YOLO (RWA-YOLO), a vision-based detection framework that integrates a wavelet-aware attention fusion module with a dual multi-path aggregation mechanism to enhance small-object detection and multi-scale feature representation. UAV-mounted LEDs are utilized to ensure robust visual perception in low-light indoor scenarios. The UAV's three-dimensional position is estimated through multi-view geometric triangulation without relying on external beacons or artificial markers. Beyond static localization, the system is validated under dynamic flight conditions, demonstrating smooth and temporally coherent trajectory reconstruction suitable for real-time control loops (update rate ≈25FPS). Extensive experiments in real indoor environments achieve centimeter-level localization accuracy (root mean square error: 9.9 mm, 95th percentile error: 13.5 mm), outperforming state-of-the-art vision-based methods and achieving accuracy comparable to or better than representative hybrid ultra-wideband-vision systems reported in the literature. These results confirm the effectiveness, robustness, and real-time capability of RWA-YOLO for indoor UAV navigation in constrained environments.