Abstract
Unmanned aerial vehicle (UAV) swarms offer an efficient solution for data collection from widely distributed ground users (GUs). However, incomplete environment information and frequent changes make it challenging for standard centralized planning or pure reinforcement learning approaches to simultaneously maintain global solution quality and local flexibility. We propose a hierarchical data collection framework for heterogeneous UAV-assisted wireless sensor networks (WSNs). A small set of high-capability UAVs (H-UAVs), equipped with substantial computational and communication resources, coordinate regional coverage, trajectory planning, and uplink transmission control for numerous resource-constrained low-capability UAVs (L-UAVs) across power-Voronoi-partitioned areas using multi-agent deep reinforcement learning (MADRL). Specifically, we employ Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to enhance H-UAVs' decision-making capabilities and enable coordinated actions. The partitions are dynamically updated based on GUs' data generation rates and L-UAV density to balance workload and adapt to environmental dynamics. Concurrently, a large number of L-UAVs with limited onboard resources perform self-organized data collection from GUs and execute opportunistic relaying to a remote access point (RAP) via H-UAVs. Within each Voronoi cell, L-UAV motion follows a weighted Vicsek model that incorporates GUs' age of information (AoI), link quality, and congestion avoidance. This spatial decomposition combined with decentralized weak-swarm control enables scalability to large-scale L-UAV deployments. Experiments demonstrate that the proposed strong and weak agent MADDPG (SW-MADDPG) scheme reduces AoI by 30% and 21% compared to No-Voronoi and Heuristic-HUAV baselines, respectively.