Abstract
BACKGROUND: The global prevalence of non-alcoholic steatohepatitis (NASH) and its associated risk of adverse outcomes, particularly in patients with advanced liver fibrosis, underscores the importance of early and accurate diagnosis. AIM: To develop a machine learning-based diagnostic model for advanced liver fibrosis in NASH patients. METHODS: A total of 749 patients who underwent liver biopsy at Beijing Ditan Hospital, Capital Medical University, between January 2010 and January 2020 were included. Patients were randomly divided into training (n = 522) and validation (n = 224) cohorts. Five machine learning models were applied to predict advanced liver fibrosis, with feature selection based on Shapley Additive Explanations (SHAP). The diagnostic performance of these models was compared to traditional scores such as the aspartate aminotransferase to platelet ratio index (APRI) and fibrosis index based on the 4 factors (FIB-4), using metrics including the area under the receiver operating characteristic curve (AUROC), decision curve analysis (DCA), and calibration curves. RESULTS: The Extreme Gradient Boosting (XGBoost) model outperformed all other machine learning models, achieving an AUROC of 0.934 (95%CI: 0.914-0.955) in the training cohort and 0.917 (95%CI: 0.880-0.953) in the validation cohort (P < 0.001). Incorporating liver stiffness measurement into the model further improved its performance, with an AUROC of 0.977 (95%CI: 0.966-0.980) in the training cohort and 0.970 (95%CI: 0.950-0.990) in the validation cohort, significantly surpassing APRI and FIB-4 scores (P < 0.001). The XGBoost model also demonstrated superior clinical utility, as evidenced by DCA and calibration curve analysis in both cohorts. CONCLUSION: The XGBoost model provides a highly accurate, non-invasive diagnosis of advanced liver fibrosis in NASH patients, outperforming traditional methods. An online tool based on this model has been developed to assist clinicians in evaluating the risk of advanced liver fibrosis.