Abstract
OBJECTIVE: This study presents the accuracy of an algorithm with a mixed approach for linking the Mortality Information System (SIM) and the Influenza Epidemiological Surveillance Information System (SIVEP-Gripe) records, implemented in DuckDB. METHODS: The proposed algorithm was compared with a previously validated algorithm, in different prevalence scenarios. We employed a hybrid deterministic-probabilistic approach, using similarity metrics such as Jaro and Jaro-Winkler. The study highlights important advantages, including superior processing speed and scalability, maintaining high values in terms of sensitivity, specificity and predictive values. RESULTS: The DuckDB-based solution processed datasets significantly faster, with execution times up to one hundred times shorter, making it particularly suitable for large-scale, real-time applications. CONCLUSIONS: This study underscores the potential of DuckDB as a high-performance analytical database for efficiently managing complex data integration tasks and highlights its suitability for resource-limited environments in public health, where timely and accurate record linkage is often essential.