Abstract
Multi-agent path planning for Unmanned Aerial Vehicles (UAVs) in agricultural data collection tasks presents a significant challenge, requiring sophisticated coordination to ensure efficiency and avoid conflicts. Existing multi-agent reinforcement learning (MARL) algorithms often struggle with high-dimensional state spaces, continuous action domains, and complex inter-agent dependencies. To address these issues, we propose a novel algorithm, Multi-Agent Transformer-based Soft Actor-Critic (MATRS). Operating on the Centralized Training with Decentralized Execution (CTDE) paradigm, MATRS enables safe and efficient collaborative data collection and trajectory optimization. By integrating a Transformer encoder into its centralized critic network, our approach leverages the self-attention mechanism to explicitly model the intricate relationships between agents, thereby enabling a more accurate evaluation of the joint action-value function. Through comprehensive simulation experiments, we evaluated the performance of MATRS against established baseline algorithms (MADDPG, MATD3, and MASAC) in scenarios with varying data loads and problem scales. The results demonstrate that MATRS consistently achieves faster convergence and shorter task completion times. Furthermore, in scalability experiments, MATRS learned an efficient "task-space partitioning" strategy, where the UAV swarm autonomously divides the operational area for conflict-free coverage. These findings indicate that combining attention-based architectures with Soft Actor-Critic learning offers a potent and scalable solution for high-performance multi-UAV coordination in IoT data collection tasks.