Abstract
AIMS: Approximately 25%-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals. METHODS: Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC). RESULTS: Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone, butyl 2-methylbutanoate, diethylethanolamine, urethane, β-caryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1-butanol. CONCLUSION: Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.