Abstract
BACKGROUND: Accurate and early diagnosis to optimise rare disease care is a global priority. With recent developments in artificial intelligence (AI)-based solutions, a promising area to improve rare disease diagnosis is the application of AI to routinely collected health care data captured in electronic health records (EHRs). MendelScan is a rare disease case-finding tool that analyses structured EHR data, using algorithms to identify patterns that are associated with an increased likelihood of the patient being affected by one of a number of rare diseases, in order to put such patients forward for further review. METHODS: In this paper, we evaluated the performance of case-finding algorithms for 34 rare diseases within MendelScan, by performing a retrospective validation study using research EHR data. The primary objectives were to assess MendelScan’s ability to correctly identify cases versus controls, and to investigate other metrics indicating feasibility of large-scale deployment and time identified (flagged) relative to diagnosis. We measured algorithm performance by sensitivity, specificity, positive predictive value (PPV), and likelihood ratios. RESULTS: Algorithm performance varied from metric to metric for the different algorithms. Sensitivity ranged from 0 to 100%, but majority were under 25% (median = 3.8% (IQR: 1.2–12.6%)) whereas specificity for most algorithms was above 99.995% (median = 99.9966% (IQR: 99.9925–99.9988)). Median PPV adjusted by literature prevalence was 3.1% (IQR: 0.7–14.6%) and by coded prevalence, 2.5% (IQR: 0.4–8.4). Median positive likelihood ratio was 1167 (IQR: 125–4006), reflecting a strong signal for disease presence, and median negative likelihood ratio was 0.96 (IQR: 0.87–0.99), reflecting limited clinical utility of a negative result. CONCLUSIONS: Our findings demonstrate the potential of using routinely collected EHR data to facilitate earlier diagnosis of rare diseases. Real world evaluations are required in order to fully ascertain the impact of such case-finding algorithms in assisting with the detection and diagnosis of patients with rare diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13023-026-04240-6.