Abstract
This paper addresses the critical challenge of fraud detection in medical insurance claims-a pervasive issue causing significant financial losses in healthcare-using Graph Neural Networks (GNNs). Given the intricate nature of healthcare data, traditional fraud detection methods do not inherently capture the complex relationships and patterns among different entities. We explore the potential of GNNs to effectively identify fraudulent claims by modeling the interactions among various entities-such as patients, healthcare providers, diagnoses, and services-as a heterogeneous graph. We employ two state-of-the-art heterogeneous GNN architectures, HINormer (Heterogeneous Information Network Transformer) and HybridGNN, along with a modified homogeneous GNN, RE-GraphSAGE (GraphSAGE "Graph Sample and Aggregate" with relation embeddings), adapted to handle the heterogeneity of healthcare data. The models are evaluated on real-world claims datasets of different sizes, comprising millions of medical activities. For the small-size claims dataset, the HINormer architecture followed by the RE-GraphSAGE architecture achieved the highest F-score (84% and 83%, respectively). For the medium-sized claims dataset, RE-GraphSAGE followed by HINormer achieved the highest F-score (84% and 81%, respectively), and for the large-size claims dataset, HINormer followed by RE-GraphSAGE achieved the highest F-score (82% and 79%, respectively). Additionally, we apply explainability techniques, namely GNNExplainer and PGExplainer, to provide insights into the models' decision-making processes and to examine their medical significance.