Abstract
We present MIMIC-III-Ext-PPG, a large-scale, quality-assessed photoplethysmography (PPG) dataset derived from the matched waveform subset of MIMIC-III. Our dataset provides 30-second PPG segments with annotations tailored for various cardiovascular and respiratory analyses. In particular, with 6.3 million segments from 6,189 subjects, it represents the largest publicly available resource for heart rhythm classification, with heart rhythm annotations derived from bedside charted observations. For subsets where arterial blood pressure (ABP), respiratory (RESP), and/or electrocardiography (ECG) signals are available, we also provide systolic/diastolic blood pressure, respiratory rate, and heart rate annotations, extracted using best practice from the underlying signals. We provide signal quality assessments for all signals. This ensures a high-quality, publicly available dataset of unprecedented size that can be used as a benchmarking resource for machine learning approaches for a broad range of prediction tasks, which remains easily extendable by leveraging additional clinical metadata from the MIMIC-III clinical database.