Abstract
BACKGROUND: Building on evidence linking urinary glyphosate to chronic liver disease (CLD) and hepatocellular carcinoma (HCC), we developed urinary pesticide profiling integrated with machine learning risk prediction (MLRP) to stratify risk in high-exposure populations. METHODS: We conducted a case-control study within the Thailand Initiative in Genomics and Expression Research for Liver Cancer (TIGER-LC; 2011-2016; n=593): 228 CLD, 116 HCC, and 249 controls. Eight urinary pesticides were quantified by LC-MS/MS (pendimethalin, oxadiazon, metsulfuron-methyl, butachlor, 2,4-dichlorophenoxyacetic acid [2,4-D], cypermethrin, flocoumafen, bromadiolone). A composite Pesticide Load Score (PLS), with and without glyphosate, estimated burden. Two predictive models were developed: a logistic-regression Pesticide-Informed Liver Cancer Risk Score (PILCRS) and an Extreme Gradient Boosting (XGBoost) classifier that incorporated age, sex, alcohol use, occupation, and PLS. Internal validity used 1,000 bootstrap resamples with optimism-corrected calibration. FINDINGS: Predicted CLD probability increased from 30% in the lowest PLS quartile to over 70% in the highest, and HCC from 10% to 40% (p<0·0001). Relative estimates were consistent; the highest versus lowest quartile yielded odds ratios of 2·84 (95% CI 1·66-4·91) for CLD and 4·76 (2·30-10·29) for HCC. Cypermethrin remained independently associated. After optimism correction, both models demonstrated strong discrimination and calibration. INTERPRETATION: This framework establishes a scalable, exposure-informed tool for liver disease prediction. Findings underscore pesticide burden as a modifiable risk factor and align with Sustainable Development Goal 3·9 and WHO-FAO priorities in low- and middle-income countries (LMICs). External validation is essential. FUNDING: National Institutes of Health (USA); Thailand Science Research and Innovation.