Record linkage in public health datasets: a practical experience in a fast in-process analytical database

公共卫生数据集中的记录链接:快速在线分析数据库的实践经验

阅读:2

Abstract

OBJECTIVE: This study presents the accuracy of an algorithm with a mixed approach for linking the Mortality Information System (SIM) and the Influenza Epidemiological Surveillance Information System (SIVEP-Gripe) records, implemented in DuckDB. METHODS: The proposed algorithm was compared with a previously validated algorithm, in different prevalence scenarios. We employed a hybrid deterministic-probabilistic approach, using similarity metrics such as Jaro and Jaro-Winkler. The study highlights important advantages, including superior processing speed and scalability, maintaining high values in terms of sensitivity, specificity and predictive values. RESULTS: The DuckDB-based solution processed datasets significantly faster, with execution times up to one hundred times shorter, making it particularly suitable for large-scale, real-time applications. CONCLUSIONS: This study underscores the potential of DuckDB as a high-performance analytical database for efficiently managing complex data integration tasks and highlights its suitability for resource-limited environments in public health, where timely and accurate record linkage is often essential.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。