Abstract
BACKGROUND: Whole genome sequence data can generate insights about Mycobacterium tuberculosis (Mtb) transmission. We used whole genome sequencing and linked epidemiology data from a recent randomized trial to characterize Mtb relatedness across 3 geographically distinct South African sites. METHODS: We sequenced culture isolates from participants with culture-positive tuberculosis in the Kharituwe study, which evaluated household contact investigation strategies in 1 urban and 2 rural sites. We adapted a previous bioinformatic pipeline to clean, extract, and filter Mtb reads; perform reference alignment; calculate single-nucleotide polymorphism (SNP) distances between isolates; and group isolates into clusters linked by recent transmission based on 3 SNP-based cutoffs. Sequence data were linked to individual data on demographics and risk factors. We analyzed clustering across and within study sites and used log-binomial regression to assess characteristics associated with clustering. RESULTS: At a cutoff of 12 SNPs, 213 of 714 sequenced isolates passing quality control filters were clustered. While only 3 of 45 pairs included participants from different sites, the majority of clusters with ≥4 participants included representation from at least 2 sites. Expanding to a 20-SNP cutoff revealed a large cluster containing 10% of isolates, with urban/rural representation mirroring that of all the isolates (61% urban, 39% rural). Participants from the urban site, TB household contacts, and participants reporting a history of incarceration were more likely to be in a cluster. CONCLUSIONS: Observed clustering and strain diversity across sites indicate the presence of multiple ongoing and geographically dispersed outbreaks in this setting.