Abstract
BACKGROUND: The detection of communicable disease clusters in genomic surveillance data typically involves the application of rule-based signaling criteria, which can be arbitrary. In contrast, scan statistics that are used for spatiotemporal cluster detection can flexibly scan in calendar time, and scan statistics that are used for pharmacovigilance can flexibly scan along hierarchical tree structures that are based on diagnosis codes. METHODS: New York City (NYC) Health Department staff applied tree-temporal scan statistics prospectively to genomic surveillance data with a hierarchical nomenclature for COVID-19 and salmonellosis cases that were diagnosed among NYC residents. We searched weekly for recent case increases at any granularity, from large phylogenetic branches to small groups of indistinguishable isolates. Using free and open-source TreeScan software, we looked for emerging SARS-CoV-2 variants based on Pango lineages during August 2021-November 2023 and emerging clusters of Salmonella isolates based on allele codes during November 2022-November 2023. RESULTS: The SARS-CoV-2 Omicron subvariant EG.5.1 first signaled as locally emerging on 22 June 2023, 7 weeks before the World Health Organization designated it as a variant of interest. During 1 year of salmonellosis analyses, TreeScan detected 15 credible clusters that were worth investigating for common exposures and two data-quality issues for correction. CONCLUSION: A challenge was the maintenance of timely and specific lineage assignments, and a limitation was that genetic distances between tree nodes were not considered. By automatically sifting through genomic data and generating ranked shortlists of nodes with statistically unusual recent case increases, TreeScan assisted in detecting emerging variants and clusters of communicable diseases and in prioritizing them for investigation.