Abstract
Detecting small drones in Infrared (IR) sequences poses significant challenges due to their low visibility, low resolution, and complex cluttered backgrounds. These factors often lead to high false alarm and missed detection rates. This paper frames drone detection as a spatio-temporal anomaly detection problem and proposes a remarkably lightweight pipeline solution (well-suited for edge applications), by employing a statistical temporal anomaly detector (known as the temporal Reed Xiaoli (TRX) algorithm), in parallel with a light-weight convolutional neural network known as the TCRNet. While the TRX detector is unsupervised, the TCRNet is trained to discriminate between drones and clutter using spatio-temporal patches (or chips). The confidence maps from both modules are additively fused to localize drones in video imagery. We compare our method, dubbed TRX-TCRnet, to other state-of-the-art drone detection techniques using the Detection of Aircraft Under Background (DAUB) dataset. Our approach achieves exceptional computational efficiency with only 0.17 GFLOPs with 0.83 M parameters, outperforming methods that require 145-795 times more computational resources. At the same time, the TRX-TCRNet achieves one of the highest detection accuracies (mAP(50) of 97.40) while requiring orders of magnitude fewer computational resources than competing methods, demonstrating unprecedented efficiency-performance trade-offs for real-time applications. Experimental results, including ROC and PR curves, confirm the framework's exceptional suitability for resource-constrained environments and embedded systems.