The challenges of integrating diverse data sources: A case study in major depression

整合多元数据源的挑战:以重度抑郁症为例

阅读:2

Abstract

Combining data from diverse sources including randomized controlled trials (RCTs) and observational datasets holds the potential to increase sample size, improve external validity, and gain a well-rounded view of the question under study. However, the practical implementation of integrating different data sources can be complicated, particularly when considering data collected across sites and institutions. In this paper, we use a case study of data from four RCTs and two electronic health record (EHR) systems to illustrate some of the challenges that can arise when combining these various sources of data. We group the challenges into cohort- and variable-related challenges, and for each challenge, we provide descriptive statistics and visuals from our case study to show the decisions that must be made and the subsequent implications. We provide guidance for researchers on the most important considerations and emphasize the necessity for careful, documented decision-making done through an interdisciplinary team. Through this case study and associated reflections, we highlight the dangers of naively combining data and advocate for a discussion and clear communication of the decisions made at each step in the data combination process, as well as the limitations and implications of those decisions.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。