The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data

不同类别分布对不平衡大数据下医疗保险欺诈检测学习者行为的影响

阅读:1

Abstract

Healthcare in the United States is a critical aspect of most people's lives, particularly for the aging demographic. This rising elderly population continues to demand more cost-effective healthcare programs. Medicare is a vital program serving the needs of the elderly in the United States. The growing number of Medicare beneficiaries, along with the enormous volume of money in the healthcare industry, increases the appeal for, and risk of, fraud. In this paper, we focus on the detection of Medicare Part B provider fraud which involves fraudulent activities, such as patient abuse or neglect and billing for services not rendered, perpetrated by providers and other entities who have been excluded from participating in Federal healthcare programs. We discuss Part B data processing and describe a unique process for mapping fraud labels with known fraudulent providers. The labeled big dataset is highly imbalanced with a very limited number of fraud instances. In order to combat this class imbalance, we generate seven class distributions and assess the behavior and fraud detection performance of six different machine learning methods. Our results show that RF100 using a 90:10 class distribution is the best learner with a 0.87302 AUC. Moreover, learner behavior with the 50:50 balanced class distribution is similar to more imbalanced distributions which keep more of the original data. Based on the performance and significance testing results, we posit that retaining more of the majority class information leads to better Medicare Part B fraud detection performance over the balanced datasets across the majority of learners.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。