Abstract
BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) remains a significant global health challenge, imposing substantial clinical and economic burdens. There is an urgent need to develop reliable predictive tools for early identification and intervention. METHODS: This study drew on Dryad database data to create and verify a clinical NAFLD predictive model, incorporating key parameters from 1,592 subjects randomly split into training and validation groups. We employed logistic regression on the training set to construct the model, visualized and internally validated it in R, and gauged its net benefit via decision curve analysis. The validation set underwent external assessment, with performance metrics including F1 score, precision, and recall. RESULTS: The model showed strong discrimination, with an receiver operating characteristic curve area of 0.80 (95% confidence interval: 0.77-0.82) in training and 0.78 in validation, indicating high accuracy in NAFLD risk prediction. Calibration tests showed close alignment between predicted and actual risks, with mean absolute error values of 0.016 (training) and 0.012 (validation). Comprehensive metrics (F1 score: 0.76, precision: 0.71, recall: 0.82) reinforced its robustness and clinical value. CONCLUSION: This study's results confirm the effective creation of an NAFLD predictive tool boasting high calibration accuracy and outstanding performance. Leveraging readily available clinical data, the model offers a scalable, economical approach to NAFLD, poised to pioneer a new paradigm for its precise prevention and control, and enable personalized prevention and efficient resource allocation.