Abstract
OBJECTIVE: Studying rare diseases like dermatomyositis (DM) in single-center cohorts is challenging due to small sample sizes and limited generalizability. This study develops and evaluates case identification algorithms for DM to enable coordinated analysis across multiple data sources. METHODS: Case identification algorithms were developed to identify adult patients with DM within 11 independent electronic health record or claims databases, totaling over 800 million patients, using the Observational Medical Outcomes Partnership Common Data Model. Algorithm performance was assessed through manual chart review and using Observational Health Data Sciences and Informatics open-source tools (CohortDiagnostics and PheValuator), which quantify incidence rates and performance metrics such as sensitivity and positive predictive value (PPV). RESULTS: Eight DM case identification algorithms were evaluated across 11 databases, revealing significant variability in performance, with sensitivity and PPV differing by more than 30% between some databases. Overall, we identified one incidence algorithm and one prevalence algorithm with good performance, demonstrated by sensitivity rates of 42% and 49% and PPV values of 83% and 84%, respectively. PheValuator quantified algorithm performance within each database, allowing for direct comparison of different criteria. Additionally, CohortDiagnostics generated incidence rates stratified by age decile and sex, aligning with previous epidemiologic data. CONCLUSION: We developed and validated multiple DM case identification algorithms across diverse databases, demonstrating their accuracy through multiple evaluation methods. This approach enables more generalizable, reproducible research using real-world data and can be applied to other rheumatic diseases.