Abstract
Aerial situational awareness (SA) faces significant challenges due to inherent complexity involving large-scale dynamic entities and intricate spatio-temporal relationships. While deep learning advances SA for specific data modalities (static or time-series), existing approaches often lack the holistic, vision-centric perspective essential for human decision-making. To bridge this gap, we propose a unified GNN-CV framework for operational-level SA. This framework leverages mature computer vision (CV) architectures to intelligently process radar-map-like representations, addressing diverse SA tasks within a unified paradigm. Key innovations include methods for sparse entity attribute transformation graph neural networks (SET-GNNs), large-scale radar map reconstruction, integrated feature extraction, specialized two-stage pre-training, and adaptable downstream task networks. We rigorously evaluate the framework on critical operational-level tasks: aerial swarm partitioning and configuration recognition. The framework achieves an impressive end-to-end recognition accuracy exceeding 90.1%. Notably, in specialized tactical scenarios featuring small, large, and irregular flight intervals within formations, configuration recognition accuracy surpasses 85.0%. Even in the presence of significant position and heading disturbances, accuracy remains above 80.4%, with millisecond response cycles. Experimental results highlight the benefits of leveraging mature CV techniques such as image classification, object detection, and image generation, which enhance the efficacy, resilience, and coherence of intelligent situational awareness.