Abstract
Hidden Markov Models (HMMs) are a powerful class of dynamical models for representing complex systems that are partially observed through sensory data. Existing data collection methods for HMMs, typically based on active learning or heuristic approaches, face challenges in terms of efficiency in stochastic domains with costly data. This paper introduces a Bayesian lookahead data collection method for inferring HMMs with finite state and parameter spaces. The method optimizes data collection under uncertainty using a belief state that captures the joint distribution of system states and models. Unlike traditional approaches that prioritize short-term gains, this policy accounts for the long-term impact of data collection decisions to improve inference performance over time. We develop a deep reinforcement learning policy that approximates the optimal Bayesian solution by simulating system trajectories offline. This pre-trained policy can be executed in real-time, dynamically adapting to new conditions as data is collected. The proposed framework supports a wide range of inference objectives, including point-based, distribution-based, and causal inference. Experimental results across three distinct systems demonstrate significant improvements in inference accuracy and robustness, showcasing the effectiveness of the approach in uncertain and data-limited environments.