Abstract
INTRODUCTION: Hepatitis C virus (HCV) infection remains highly prevalent in Pakistan, particularly among patients with multiple comorbid conditions. Despite the widespread availability of direct-acting antivirals (DAAs), practical machine learning approaches to predict sustained virological response (SVR) are still lacking in resource-limited settings. METHODS: This retrospective cohort study analyzed 221 comorbid HCV patients treated with Sofosbuvir + Daclatasvir ± Ribavirin combination therapy. Baseline demographic and laboratory parameters were preprocessed using standard scaling methods. The dataset was split into 70% training and 30% testing subsets, and class imbalance in the training set was addressed using SMOTE. Five machine learning models, logistic regression, decision tree, random forest, XGBoost, and SVM, were tuned using stratified five-fold cross-validation. Evaluation metrics, including accuracy, precision, recall, specificity, F1-score, and ROC-AUC, were used to assess test-set performance, and SHAP analysis was conducted for the top-performing model. RESULTS: Among the 221 patients, 162 (73%) achieved SVR. Random Forest and SVM demonstrated the best discriminatory performance, with Random Forest achieving the highest accuracy (0.73), precision (0.84), and F1-score (0.81), while SVM produced the highest recall (0.82) and ROC-AUC (0.76). ALT and AST consistently emerged as the strongest predictors associated with treatment failure. CONCLUSION: These findings support the potential of ML-based decision tools using routine clinical data in high-burden, resource-limited settings to guide risk stratification, optimize monitoring intensity, and inform public health strategies for HCV control and elimination in Pakistan and highlight the need for broader validation across larger, multicenter cohorts.