Abstract
Metabolic dysfunction-associated fatty liver disease (MAFLD) is a highly prevalent liver condition closely linked to obesity, insulin resistance, and metabolic syndrome. Early identification of MAFLD remains challenging in routine health examination settings remain challenging, especially in routine health examination settings where conventional indicators often fail to capture deeper metabolic disturbances. This study aimed to evaluate the predictive value of body composition parameters and develop and validate a non-invasive, machine learning-based classification model for MAFLD. A retrospective study was conducted using data from 23,348 adults who underwent health check-ups between 2017 and 2021 at a tertiary hospital in China. Body composition was assessed via bioelectrical impedance analysis, and MAFLD was diagnosed based on hepatic steatosis plus metabolic risk criteria. A total of 13 features, including body composition indicators and basic demographics, were initially considered. Feature selection was guided by multicollinearity diagnostics and model-based importance analysis. Eight machine learning models were constructed and evaluated using tenfold cross-validation. An independent external validation cohort of 3,357 participants from 2022 to 2023 was used to assess generalizability. Performance was evaluated using area under the receiver operating characteristic curve, accuracy, recall, F1 score, and calibration metrics. Among all models, tree-based algorithms including extreme gradient boosting, gradient boosting decision tree, and LightGBM achieved the highest discriminative performance, with internal validation area under the curve values exceeding 0.96 and external validation area under the curve values above 0.95. Visceral fat rating consistently emerged as the most important predictor, followed by waist circumference and body mass index. Logistic regression confirmed their independent associations with MAFLD after adjustment for key confounders. Stratified analyses revealed variable patterns across sex, age, and body mass index groups, with visceral fat remaining a robust predictor in all subgroups. Body composition analysis, particularly visceral fat estimation, demonstrates strong diagnostic discrimination for MAFLD using non-invasive measurements. Integrating these parameters with machine learning enables accurate identification, supporting scalable screening and aiding diagnostic assessment in routine health examination, clinical, and public health settings.