Abstract
The study addresses the increasing resistance to the FDA-approved drug Bedaquiline (BDQ) in Mycobacterium tuberculosis (MTB). The absence of any defined resistance locus and the wide variation in the drug targets across clinical isolates have raised a big question about our understanding of the molecular basis of BDQ resistance acquisition. Using machine learning (ML) methods, BDQ resistance was predicted from whole-genome sequencing data for MTB clinical isolates. Variant calling format data generation involved several steps, including adapter trimming and alignment to the H37Rv reference genome. The ML models, namely, Multilayer Perceptron and Random Forest (RF), achieved high accuracies of 83.60% and 79.64%, respectively. The top 50 features were mapped to the H37Rv reference genome, and several new drug targets were identified. In addition to the coding regions, some non-coding intergenic regions were also obtained. Mapping of these features to the H37Rv genome revealed 15 new antibiotic-resistant genes. In addition, the use of explainable AI (XAI) methods, such as SHapley Additive exPlanations, facilitated the identification of mutations associated with BDQ resistance. In conclusion, the ML models demonstrated effective predictive capabilities for BDQ resistance, whereas XAI contributed to understanding key resistance features.