Abstract
Cooperative multi-UAV pursuit-evasion under occlusions and sensor noise is challenged by intermittent observability of the evader, varying observation-window lengths, and non-stationary evader tactics, all of which destabilize prediction and undermine safety-constrained cooperation. To address these challenges, we propose a safe decision-making framework that uses behavior mode and subgoal inference as intermediate representations for interpretable, uncertainty-aware cooperation. Specifically, an observation-driven generative intent-subgoal model infers the evader's behavior mode and subgoal from short observation windows. Building on this model, a length-agnostic trajectory predictor is trained via multi-window knowledge distillation and consistency regularization to produce future trajectory predictions with calibrated uncertainty for arbitrary observation-window lengths, thereby reducing cross-window inference inconsistency and lowering online computational cost. Based on these predictions, we derive belief and risk features and develop a belief-risk-gated hierarchical multi-agent policy based on soft actor-critic with a safety projection layer, enabling adaptive strategy switching and a controllable trade-off between efficiency and safety. Experiments in obstacle-rich pursuit-evasion environments with randomized layouts and diverse obstacle configurations demonstrate more stable cooperative capture, safer maneuvering, and lower decision variance than representative baselines, indicating strong robustness and real-time feasibility. Specifically, across different observation-window settings, the proposed method improves the normalized expected return by approximately 5-7% over the strongest baseline and reduces pursuer losses by roughly 22-25%. Moreover, its end-to-end decision latency consistently remains within the 50 ms control cycle.