Abstract
Protease inhibitors (PIs) target the protease (PR) enzyme to suppress viral replication. Their efficacy in human immunodeficiency virus treatment is compromised by the emergence of drug-resistant strains. Therefore, forecasting drug-resistance during viral evolution would help in the design of effective treatment strategies. To this end, we develop a framework that bridges two distinct data sets. First, we train probabilistic models to learn coevolutionary information in observed PR genotypes in different PI treatment regimens. We use these models to infer transition probabilities of point-mutations conditioned on the genotype and the treatment regimen. Second, we train another set of models to infer drug resistance of PR genotypes to different PIs using data of clinically measured drug resistance. We use these models together to simulate evolutionary trajectories and predict drug resistance. Importantly, we use these simulations to forecast the emergence of persistent drug resistant genotypes. Our analysis shows that the dual therapy of Atazanavir (ATV) and Ritonavir (RTV) is the multi-PI treatment regimen least likely to induce drug resistance. We also conduct an exhaustive ablation study of all possible mutations and predict seven point-mutations as critical for drug resistance. Interestingly, our results highlight the necessity of the amino-acid polymorphism of L63P by predicting that it is critical in developing resistance to Nelfinavir (NFV). The results validate that our framework effectively extracts and combines biological information from the distinct data sets of observed genotypes and drug resistance, while also tackling the challenge of sparsity of available sequence data compared to the large combinatorial complexity of protein evolution and changing functionality in dynamic environments.