Abstract
OBJECTIVES: To address the limitations of existing models for research and population health applications in older adults with type 2 diabetes, we developed and validated cardiovascular disease (CVD) and heart failure risk models using linked Medicare claims and electronic health records (EHRs; 2013-2020). STUDY DESIGN AND SETTING: The study included adults aged >65 years with type 2 diabetes and ≥1 HbA(1c) measurement before cohort entry (defined as the date of a physician/outpatient visit). Using least absolute shrinkage and selection operator and extreme gradient boosted machine learning algorithms, we predicted 1-year risks of a composite cardiovascular event (myocardial infarction, stroke, coronary artery revascularization, or hospitalization for heart failure). Separate models were developed for patients with and without baseline CVD using claims-only and claims-EHR predictors. Models were trained on 70% of the data and validated on 30%. Model performance was evaluated using c-statistics for discrimination, scaled Brier scores, and calibration curves. We externally validated the models in Clinformatics commercial and Medicare Advantage claims data. RESULTS: There were 14,776 patients with baseline CVD (mean [SD] age: 77 [8] years) and 10,679 without baseline CVD (mean [SD] age: 74 [7] years). Claims-only models achieved a c-statistic of 0.75 and a Brier score of 0.09 in patients with baseline CVD, while in those without baseline CVD, the c-statistic was 0.73, and the Brier score was 0.01. For both subgroups, calibration intercepts were ∼0, with slopes ∼1. Claims-EHR models provided similar performance. CONCLUSION: In older adults with diabetes, our models predicted 1-year cardiovascular outcomes with good discrimination and accuracy, independently of CVD history. PLAIN LANGUAGE SUMMARY: Older adults with type 2 diabetes have a high risk of heart disease, heart failure, and death, yet it is difficult to predict who is most at risk. Most existing prediction tools are designed for use during a single clinic visit, not for large health care databases that researchers use to study treatment safety and effectiveness. In this study, we developed computer-based models using Medicare claims data and, for some models, additional information from EHRs. These models predicted the chance of having a major heart event or dying within 1 year. We created separate models for people with and without existing heart disease because their risk factors differ. Our models accurately predicted risk in both groups. Adding EHR data did not improve performance compared to using claims data alone. This means that claims-only models can still be useful for researchers studying treatments in large health care databases. These models can help identify people at higher risk, guide research on diabetes medications, and support better planning for health care resources.