Generating Reward Functions Using IRL Towards Individualized Cancer Screening

利用 IRL 生成奖励函数以实现个体化癌症筛查

阅读:2

Abstract

Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。