Abstract
During the 2022-2023 cholera outbreak in Lebanon, cases were reported through the District Health Information System 2 (DHIS2). We developed automated procedures in R computing language to improve completeness of routinely notified variables, apply case definition criteria, improve geographic accuracy and documentation of laboratory results. We developed R scripts for data cleaning, standardization, and reclassification, plotted epidemic curves and produced maps to display cholera incidence rates and rapid diagnostic test (RDT) coverage by district. We shared the R scripts on GitHub platform for open adaptation and use. Prior to cleaning, missingness reached 99.7% for inpatient status and 17-35% for other key variables. After cleaning, all fields were complete. Initially, 92.8% of cases were notified through DHIS2 as suspected and 7.2% as confirmed. Following reclassification, 40% were classified as suspected, 5.8% as confirmed, and 48.6% with unspecified classification. Laboratory data revealed that 5.8% of cases were culture positive, 2.2% RDT positive, and 65.1% had no documented testing. Among facility-entered cases (n = 5953), 11.4% were reported from a different governorate than the patient's residence. At the time of the outbreak, the daily maps were generated based on place of residence. Integrating R-based analytics with DHIS2 enhanced data completeness, improved case classification, and enabled more better spatial and laboratory analysis. This combined approach provided a clearer epidemiological picture of the cholera outbreak, supporting data-driven public health decision-making and highlighting the value of integrating analytical tools with routine surveillance systems.