Abstract
Precise temperature control is crucial for maintaining product quality and optimizing energy efficiency in multi-zone continuous crystallizers. However, such industrial processes typically exhibit complex nonlinear dynamics and strong coupling effects. More critically, physical constraints often prevent sensor installation, rendering temperatures in key regions unobservable and challenging traditional closed-loop control strategies. To address partial observability and model uncertainty, this paper proposes a Model-Based Reinforcement Learning (MBRL) framework utilizing solely offline historical data. The core innovation lies in developing a Recursive State Space Model (RSSM) that serves not only as a high-fidelity digital twin but, more critically, is deployed as a real-time "virtual sensor" to infer unobservable system states. This virtual sensing capability provides precise state estimates for downstream policy optimization. Additionally, a multi-objective reward function is designed to balance tracking error, stability, and control cost. Experimental results demonstrate that the proposed virtual sensor exhibits exceptional long-term stability, maintaining high fidelity and effectively suppressing error accumulation during long-term multi-step autoregressive predictions. Consequently, the trained agent outperforms traditional Proportional-Integral-Derivative (PID) and Model Predictive Control (MPC) controllers, achieving over 67% improvement in temperature tracking accuracy while reducing control action costs by more than 93%, indicating smoother system operation and enhanced energy efficiency.