Abstract
INTRODUCTION: This study aimed to investigate the association between liver disease (LD) and stroke using cross-sectional data from the China Health and Retirement Longitudinal Study (CHARLS). METHODS: Participants aged ≥45 years with complete data on LD, stroke, and key covariates were selected from the 2018 CHARLS wave (n = 4586). The association was assessed using sequential multivariable logistic regression and weighted stratified analyses. To explore complex relationships, machine learning models (SVM, LR, RPART, RF, NB) were applied. The data were split into training (70%) and test (30%) sets, with the Random Over-Sampling Examples (ROSE) technique used to address class imbalance during training. RESULTS: Baseline analysis revealed a significant association between liver disease (LD) and stroke (P < 0.001). In the fully adjusted model (Model 3), LD remained significantly associated with stroke (OR = 2.6, 95% CI = 1.43-4.46, P = 0.001). Stratified analyses suggested the robustness of this association across subgroups. Model 3 achieved an area under the curve (AUC) of 0.70. After rigorous validation and class imbalance adjustment, the exploratory machine learning analysis, including the random forest algorithm, did not demonstrate meaningful predictive performance for stroke within this dataset. CONCLUSION: This cross-sectional analysis identifies a significant association between liver disease and stroke in Chinese adults aged ≥45 years. While machine learning was explored, it served primarily as an analytical complement, with results underscoring the critical impact of methodological rigor, particularly in handling class imbalance. The observational design precludes causal inference, but the findings highlight a concurrent link warranting further longitudinal investigation.