Abstract
INTRODUCTION: Cardiovascular research faces challenges such as limited patient data, privacy concerns, and analytical complexity. EPICOSAI addresses these by generating high-fidelity synthetic data, enabling robust analysis without compromising confidentiality. METHODS: Real cardiovascular datasets were used to simulate synthetic patient data via Monte Carlo simulations. EPICOSAI applied normal, binomial, Bernoulli, and exponential distributions to model variables like age, ejection fraction, comorbidities, and time-to-event outcomes. Generated datasets were validated through descriptive statistics, distribution matching, and machine learning model comparisons. RESULTS: Synthetic datasets showed <5% deviation from original data in means and variances. In heart failure cohorts, ejection fraction and age distributions closely matched real data. Expanding datasets from 150 to 12,000 patients enabled subgroup analyses and improved model performance. Machine learning models trained on synthetic data achieved 91–94% accuracy, comparable to real-data models. CONCLUSION: EPICOSAI empowers cardiovascular researchers to overcome data limitations ethically and efficiently. By simulating realistic datasets while preserving privacy, it enhances research quality and supports advanced statistical and AI-based analyses in cardiology.