Abstract
We propose a Transformer-based generative model that learns socially responsive customer trajectories in retail stores directly from data. Each trajectory is represented as a sequence of symbolic tokens that encode not only the self-location of the focal customer but also the positions of their nearest neighbors at each timestep. This interaction-aware encoding enables the model to reproduce adaptive behaviors-such as slowing down, rerouting, and early disengagement-without predefined rules. To ensure that only the focal customer's behavior is learned while using neighbors as context, we introduce an asymmetric loss masking scheme that excludes non-focal tokens from prediction targets. The model is trained from scratch using high-resolution indoor positioning data and validated through large-scale agent-based simulations under varying crowding levels. In these simulations, each agent is equipped with a Transformer module that predicts its next step based on local spatial context, enabling the system to evolve through decentralized, data-driven decision-making. The model replicates spatial density patterns, dwell time distributions, and congestion-induced speed reductions observed in real stores. This model offers a scalable and interpretable approach to trajectory generation in indoor commercial environments.