Abstract
BACKGROUND: Chronic obstructive pulmonary disease (COPD) imposes a substantial global health burden, and COPD management is closely associated with patients’ health information literacy (HIL). Given the multifactorial and potentially nonlinear relationships underlying HIL, traditional analyses may be limited in capturing these complex patterns. We applied machine learning to identify key predictors of HIL to inform targeted interventions. METHODS: We recruited 432 patients with chronic obstructive pulmonary disease (COPD) from respiratory outpatient clinics in six tertiary hospitals in Hunan Province, China, between December 2023 and December 2024. Data were collected using a general information questionnaire, the Health Information Literacy Questionnaire, and the COPD Self-Management Scale. A random forest model was used to rank candidate predictors, and LASSO regression was used for variable selection. RESULTS: The mean HIL score was 15.22 ± 2.44, and 16.9% of participants had adequate HIL (≥ 60). Random forest model and LASSO regression identified self-management as the most influential predictor of HIL, followed by age and education level (P < 0.05). In multivariable linear regression, higher self-management and education level were associated with higher HIL, whereas older age was associated with lower HIL (all P < 0.05), explaining 56.7% of the variance in HIL (adjusted R²=0.567). CONCLUSION: HIL among COPD patients was suboptimal. Information search is emerging as the weakest domain. Self-management, age, and educational level were independently associated with HIL. Interventions should prioritize modifiable targets—particularly strengthening self-management skills and delivering age-friendly, literacy-sensitive education for older and less-educated patients.