Abstract
Prevalence estimates using primary care data health identify cases via code lists. Validation studies can discover and exclude false positives, but it is often difficult or impossible to find false negatives. This study aimed, using the example of psoriatic arthritis (PsA), to examine the extent of and adjust for misclassification by linking primary care records with text-mined outpatient letters from a Northwest regional hospital (2014-2019). Two hundred forty-five cases of PsA were identified among 188 286 adults registered with primary care, giving an observed prevalence of 0.13% [95% CI, 0.11%-0.15%]. Among a subgroup of 7532 primary care patients attending the hospital rheumatology clinic, 202 had a primary care PsA code: 188 were confirmed as true PsA, while 14 were false positives. Primary care codes failed to identify 196 hospital-diagnosed PsA cases, leading to a more than 2-fold underestimation. The adjusted prevalence, accounting for misclassification, was 0.25% [95% CI, 0.21%-0.28%]. Linking primary care with hospital records identified false positives and negatives, enabling correction of prevalence estimates. This highlights the value of text-mining hospital letters to replace the national absence of coded secondary care diagnosis data from outpatient departments, and the importance of considering the impact of false negatives.