Abstract
The increasing volume and heterogeneity of patient care data present significant challenges for comprehensive analysis and the generation of insights, particularly in specific areas such as respiratory diseases. Standardizing diverse health data is crucial for enabling large-scale observational research and ensuring data readiness. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a widely adopted standard for harmonizing such data. However, evaluating the quality of data transformed into the OMOP CDM format is a critical step before its use in research or clinical decision support. This study evaluates the impact of the OMOP CDM standardization process on generating data quality insights for a respiratory disease dataset. The source dataset was initially paper-based, converted to an electronic format, and translated from French into English. This historical dataset covers the years 2009-2023 and contains 108 variables and 2,154 records. The data underwent the standard Extract, Transform, and Load (ETL) process to convert into the OMOP CDM format. Following this transformation, the quality of the resulting OMOP CDM instance was assessed. We utilized the Data Quality Dashboard (DQD) to evaluate the quality of the OMOP CDM database before and after ETL verification. DQD performs validation checks on the data based on key data quality dimensions, including completeness, plausibility, and conformance. Overall, the assessment conducted 2,344 checks, of which 2,269 passed, and 75 failed, resulting in a corrected pass rate of 96% for the Respiratory Diseases Inpatients data before ETL verification. After ETL verification, the assessment conducted 2,374 checks, of which 2,356 passed, and 40 failed, resulting in a 100% corrected pass rate. Standardizing respiratory disease data using the OMOP CDM enabled a structured and transparent evaluation of data quality. Through the application of the DQD, this study demonstrated the utility of OMOP CDM in generating meaningful data quality insights. These findings highlight the model's potential to enhance data readiness and support evidence-based decision-making in respiratory disease management.