Abstract
Agents increasingly operate in complex environments, where coherent behavior often emerges from opaque decision-making processes. While such systems can be highly effective, this lack of transparency limits trust, auditing, and meaningful human understanding. We introduce intentional policy graphs, a post hoc, model-agnostic framework that explains agent behavior in terms of intentions: probabilistic commitments to desired outcomes inferred from partial observations. By extending policy graphs with a formal notion of intention, we move beyond action-level descriptions toward telic explanations of why agents pursue particular trajectories. The framework provides a complete construction pipeline, design principles, and quantitative metrics that explicitly characterize the trade-off between interpretability and reliability. Intentions support structured answers to what, how, and why questions, enabling both local and global explanations of behavior. We demonstrate the approach in a cooperative multi-agent game and on real-world human driving data, highlighting its generality and explanatory power without access to internal reasoning models.