Abstract
We introduce MediSim, a multi-modal generative model for simulating and augmenting electronic health records across multiple modalities, including structured codes, clinical notes, and medical imaging. MediSim employs a multi-granular, autoregressive architecture to simulate missing modalities and visits and iterative, reinforcement learning-based training to improve simulation in low-data settings. Additionally, it utilizes encoder-decoder model pairs to handle complex modalities like notes and images. Experiments on outpatient claims and inpatient ICU datasets have demonstrated MediSim's superiority over baselines in predicting missing codes, creating enriched data, and improving downstream predictive modeling. Specifically, MediSim improved over 74% on missing code prediction, enabled up to 65% better downstream predictive performance compared to original deficient records missing either some visits or entire data modalities, and successfully produced realistic note and X-ray samples for use in downstream tasks. MediSim's ability to generate comprehensive, high-dimensional EHR data has the potential to significantly improve AI applications throughout healthcare.