Abstract
Predicting multiple future pedestrian trajectories is a challenging task for real-world applications like autonomous driving and robotic motion planning. Existing methods primarily focus on immediate spatial interactions among pedestrians, often overlooking the impact of distant spatial environments on their future trajectory choices. Additionally, aligning trajectory smoothness and temporal consistency remains challenging. We propose a multimodal trajectory prediction model that utilizes spatio-temporal graphical attention networks for crowd scenarios. Our method begins by generating simulated multiview pedestrian trajectory data using CARLA. It then combines original and selected multiview trajectories using a convex function to create augmented adversarial trajectories. This is followed by encoding pedestrian historical data with a multitarget detection and tracking algorithm. Using the augmented trajectories and encoded historical information as inputs, our spatio-temporal graph Transformer models scaled spatial interactions among pedestrians. We also integrate a trajectory smoothing method with a Memory Storage Module to predict multiple future paths based on historical crowd movement patterns. Extensive experiments demonstrate that our proposed MTP-STG model achieves state-of-the-art performance in predicting multiple future trajectories in crowds.