Abstract
BACKGROUND: Mycobacterium tuberculosis (Mtb) genomic epidemiology often relies on culturing patient sputum, a time and labor-intensive process. Hybrid capture approaches have been successfully used to enrich Mtb DNA from complex clinical samples, yet the accuracy of variant identification from captured samples has not been systematically evaluated. METHODS: We created artificial strain mixtures of two well-characterized Mtb isolates such that the minor strain comprised 0-50% of Mtb DNA and serially diluted the Mtb DNA into human DNA to simulate diagnostic samples with different sputum smear burdens (32 samples). We also prospectively collected paired Mtb diagnostic cultures and sputum submitted to a national diagnostic laboratory (7 sample pairs). We performed hybrid capture and Illumina whole genome sequencing for all samples. For the artificial strain mixtures, we measured hybrid capture efficiency, the percentage of total reads mapping to Mtb, and performance of fixed and minority variant identification. For the diagnostic samples, we compared the number, identity of, and minor allele frequencies of minority variants identified in the cultured and hybrid captured samples. RESULTS: In the artificial strain mixture experiment, hybrid capture efficiency was 97% when Mtb comprised 0.01% of the input DNA. Single nucleotide polymorphism (SNP) identification via hybrid capture had a sensitivity ≥ 91% and precision ≥ 97% for Mtb lineages 4.1.2.1 and 4.9, excluding PE/PPE genes, when Mtb comprised 0.01% of input DNA. Observed minor allele frequencies were closely correlated (r=0.60 to r=0.79, p < 0.001) with input minor allele frequences across all dilutions. Among paired diagnostic samples, hybrid capture efficiency was high, 95%. However, four of the seven captured sputa samples were overwhelmed with Pseudomonas contamination, which comprised >25% of sequence reads. We did not detect a significant difference in the number of minority variants identified in cultured (median: 14, range: 9-23) and hybrid captured samples (median: 22 variants, range 4-328, p=0.2) and minor allele frequences were correlated (r = 0.95, p < 0.001). CONCLUSIONS: Hybrid capture of diagnostic sputa samples efficiently generates accurate Mtb whole genome sequences and minority variant calls. Hybrid capture may offer an alternative to culture-based sequencing that could extend the coverage of genomic epidemiology studies.