Development of an algorithm for ethnicity recording in cohorts from the UK Clinical Practice Research Datalink primary care and linked Hospital Episode Statistics databases

开发一种用于记录英国临床实践研究数据链初级保健和关联的医院诊疗统计数据库中队列人群种族信息的算法

阅读:3

Abstract

OBJECTIVE: To evaluate various prioritisation strategies within an algorithm designed to ascertain the most likely ethnicity and create a standardised methodology to benefit future research. DESIGN: Retrospective cohort study. SETTING: The Clinical Practice Research Datalink (CPRD) primary care and linked Hospital Episode Statistics (HES) data sets. PARTICIPANTS: The population of 54 029 174 patients included all acceptable patients registered at English practices in CPRD GOLD or CPRD Aurum from the May 2023 to May 2022 builds, respectively. PRIMARY OUTCOME MEASURE: Ethnicity data within CPRD and HES data sets were identified by employing established code lists and subsequently categorised into broader ethnic groups. Changes were made to a previously used algorithm to assess their effect on ethnic categorisations. Modifications included prioritising primary over secondary care data, recent over frequent records and 'non-other' ethnicity categories. Different data sources were examined: CPRD with all HES data sets, CPRD with HES Admitted Patient Care (APC) only, CPRD only and HES APC only. Ethnic distributions from these variations were compared using counts and percentages, evaluating inter-rater reliability using Cohen's kappa. Sensitivity analyses included repetition using only currently registered patients and after removing cases with unknown ethnicity. Ethnic distributions were compared with English Census 2021. RESULTS: There was almost perfect agreement in ethnicity distributions whether prioritising primary over secondary care data (kappa=1.0000, SE=0.0001), whether prioritising most frequently or most recently recorded data (kappa=0.9824, SE=0.0001) and whether prioritising 'non-Other' categories (kappa=0.9705, SE=0.0001). There was moderate agreement in ethnicity distributions when sourcing data from single data sources (CPRD only (kappa=0.5554, SE=0.0001) or HES APC only (kappa=0.5526, SE=0.0001)) compared with combined data sources (CPRD and HES datasets). CONCLUSIONS: All variations of the algorithm produced similar population-level ethnicity distributions. Versions using data from multiple sources had higher inter-rater reliability than those using a subset of sources; however, there was little difference in categorisations produced by varying the hierarchical decision-making of the ethnicity algorithm. The CPRD population was representative of the English population in terms of ethnicity. While researchers should remain vigilant of the limitations of using these data, the CPRD Ethnicity Records provide a standardised and pragmatic approach to ascertaining ethnicity for future research.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。