Abstract
BACKGROUND: Significant volumes of research rely on secondary care diagnostic coding to identify comorbidities however little is known about its accuracy at a population level or if this influences subsequent analysis. METHODS: Retrospective observational study utilising real world data for all cancers, prostate cancer and breast cancer patients diagnosed at Leeds Cancer Centre from 2005 and 2018. Three different data definitions were used to identify patients with diabetes in each cohort: (1) clinical coding alone, (2) HbA1c blood test alone (3) either clinical coding or abnormal HbA1c. Cohort characteristics, diagnosis dates and Cox derived survival was compared across diabetes definitions. RESULTS: 123,841 cancer patients were identified including 13,964 with diabetes. Clinical coding failed to identify 14.6% of diabetic cancer patients with a temporal misclassification rate of 17.5%. Sole reliance on clinical coding overestimated the negative effect of DM on median survival across all cancers and 3.17 years in breast cancer. DISCUSSION: Clinical coding provides inaccurate diabetes diagnosis date and detection resulting in meaningful differences in analytic outcomes. This supports the use of more detailed comorbidity data definitions. Results casts doubt over research reliant on hospital clinical coding alone and the generalisability of some comorbidity and frailty scoring systems.