Abstract
The Harvard-Emory ECG Database (HEEDB) is currently the largest open-access collection of 12-lead electrocardiogram (ECG) recordings, developed through a collaboration between Harvard and Emory University. The database consists of 10,608,417 ECG recordings from 1,818,247 patients from Massachusetts General Hospital (MGH) and 998,844 recordings from 349,548 patients from Emory University Hospital (EUH) collected between 1980 and 2022 in clinical settings as part of routine patient care. The ECGs are 10-second, 12-lead recordings sampled at either 250 or 500 Hz, and stored in WFDB format. Each ECG is linked to demographic metadata (age, sex, race, ethnicity, education), along with deidentified acquisition dates, last visit dates, and death dates, when available. The dataset includes three forms of Marquette(TM) 12SL Analysis Software annotations: (1) batch-reprocessed diagnostic labels generated using the latest available 12SL software (version 24); (2) the original 12SL outputs from the time of ECG acquisition; and (3) the corresponding physician overreads. Additionally, the dataset includes associated ICD-9 and ICD-10 codes with the corresponding diagnosis dates. This database represents a large, diverse multi-center collection on which machine learning algorithms can be trained and tested for performance and bias.