Abstract
In recent decades, Electronic Health Records (EHRs) have become increasingly useful to support clinical decision-making and healthcare. EHRs usually contain heterogeneous information, such as structural data in tabular form and un-structured data in textual notes. Different types of information in EHRs can complement each other and provide a comprehensive picture of a patient's health status. While there has been a lot of research on the representation learning of structured EHR data, the fusion of different types of EHR data (multimodal fusion) is not well studied. This is mostly because of the complex medical coding systems and the noise and redundancy in the written notes. In this work, we propose a new framework called MINGLE, which effectively integrates both structures and semantics in EHR. Our framework uses a two-level infusion strategy to combine medical concept semantics and clinical note semantics into hypergraph neural networks, which learn the complex interactions between different types of data to generate visit representations for downstream prediction. Experiment results on two EHR datasets, the public MIMIC-III and private CRADLE, show that MINGLE can effectively improve predictive performance by 11.83% relatively, enhancing semantic integration as well as multimodal fusion for structural and textual EHR data.