Abstract
AIM/INTRODUCTION: Chronic obstructive pulmonary disease (COPD) is a complex, heterogeneous syndrome often accompanied by vascular diseases that worsen prognosis and quality of life. This study aimed to develop a machine learning model to identify concurrent vascular diseases in symptomatic COPD patients. MATERIALS AND METHODS: We retrospectively analyzed data from 6,274 COPD patients treated between July 2010 and July 2018. Patients were randomly split into training and validation sets (7:3). After feature selection using LASSO regression, eight machine learning algorithms-including Logistic Regression, Random Forest, Gradient Boosting, Support Vector Machine, Neural Network, Convolutional Neural Network, AdaBoost, and Stacked Generalization (Stacking)-were applied to develop and validate predictive models. Performance was evaluated using AUC, calibration curves, and decision curve analysis (DCA). RESULTS: The Stacking model achieved the highest AUC (0.867; 95% CI: 0.852-0.882), with 79.4% accuracy, 74.9% sensitivity, and 84.0% specificity. It also demonstrated excellent calibration and, on DCA, provided the highest net clinical benefit within the threshold probability range of 0.1-0.5. At a 0.2 threshold, the model could prevent approximately 35% of unnecessary interventions compared to a "treat-all" approach, while identifying about 75% of high-risk patients relative to a "treat-none" strategy. CONCLUSIONS: The Stacking machine-learning model showed superior performance in identifying concurrent vascular disease among symptomatic COPD patients, offering strong discriminative ability, calibration, and clinical utility. It may serve as an effective decision-support tool to optimize diagnostic evaluation in this high-risk subgroup.