Abstract
OBJECTIVE: This study aimed to develop and evaluate deep learning models to improve the prediction of persistent smoking in patients with chronic obstructive pulmonary disease (COPD) by integrating behavioral and psychosocial variables with clinical data from a structured national dataset. METHODS: Three deep learning models and one machine learning model were developed and assessed using clinical, behavioral, and psychosocial data from 350 patients with COPD, including 51 current smokers. Data preprocessing involved imputation, variable transformation, and class weighting. Hyperparameter optimization was performed using the Optuna framework. Model performance was evaluated with repeated stratified K-fold cross-validation, and the macro F1 score was the primary metric. Shapley Additive Explanations (SHAPs) were applied to assess feature importance and improve interpretability. RESULTS: The Residual Neural Network achieved the highest performance, with a macro F1 score of .87 (95% confidence interval: .83-.89). SHAP analysis highlighted professional advice to quit, employment status, sputum symptoms lasting more than 3 months, perceived stress level, health check-up experience, and health literacy as key predictors of persistent smoking. CONCLUSION: Incorporating behavioral and psychosocial data enabled the models to capture complex smoking patterns while maintaining interpretability. These findings emphasize the value of multidimensional data in identifying high-risk individuals and informing targeted smoking cessation strategies in COPD care. Future research should include synthesized behavioral variables often absent from large external datasets and validate model performance in more diverse populations.