Abstract
Utilizing offline reinforcement learning (RL) with real-world clinical data is getting increasing attention in AI for healthcare. However, implementation poses significant challenges. Defining direct rewards is difficult, and inverse RL struggles to infer accurate reward functions from expert behavior in complex environments. Offline RL also encounters distributional discrepancies between learned policies and observed human behavior, a critical issue in healthcare applications. To address challenges in applying offline RL to physical activity promotion for older adults at high risk of falls, based on wearable sensor activity monitoring, we introduce Kolmogorov-Arnold Networks and Diffusion Policies for Offline Inverse Reinforcement Learning (KANDI). Specifically, by leveraging the flexible function approximation in Kolmogorov-Arnold Networks, we estimate the reward function by learning free-living environment behavior from low-fall-risk older adults (experts). Additionally, diffusion-based policies within an Actor-Critic framework provide a generative approach for action refinement, enabling controlled exploration and mitigating distributional shift issues in offline RL. We evaluate KANDI using wearable activity monitoring data in a two-arm clinical trial from our Physio-feedback Exercise Program (PEER) study, emphasizing its practical application in a fall-risk intervention program to promote physical activity among older adults. Our analysis identifies the optimal timing for anti-sedentariness interventions tailored to varying levels of fall risk, thereby maximizing daily physical activity. Additionally, we evaluate KANDI on the D4RL benchmark, outperforming the state-of-the-art methods in each domain. These results underscore KANDI's potential to address key challenges in offline RL for healthcare applications, offering an effective solution for the optimal timing and policy for activity promotion intervention strategies.