Abstract
BACKGROUND: Machine learning based on clinical characteristics has the potential to predict coronary CT angiography (CCTA) findings and help guide resource utilisation. METHODS: From the SCOT-HEART (Scottish Computed Tomography of the HEART) trial, data from 1769 patients was used to train and to test machine learning models (XGBoost, 10-fold cross validation, grid search hyperparameter selection). Two models were separately generated to predict the presence of coronary artery disease (CAD) and an increased burden of low-attenuation coronary artery plaque (LAP) using symptoms, demographic and clinical characteristics, electrocardiography and exercise tolerance testing (ETT). RESULTS: Machine learning predicted the presence of CAD on CCTA (area under the curve (AUC) 0.80, 95% CI 0.74 to 0.85) better than the 10-year cardiovascular risk score alone (AUC 0.75, 95% CI 0.70, 0.81, p=0.004). The most important features in this model were the 10-year cardiovascular risk score, age, sex, total cholesterol and an abnormal ETT. In contrast, the second model used to predict an increased LAP burden performed similarly to the 10-year cardiovascular risk score (AUC 0.75, 95% CI 0.70 to 0.80 vs AUC 0.72, 95% CI 0.66 to 0.77, p=0.08) with the most important features being the 10-year cardiovascular risk score, age, body mass index and total and high-density lipoprotein cholesterol concentrations. CONCLUSION: Machine learning models can improve prediction of the presence of CAD on CCTA, over the standard cardiovascular risk score. However, it was not possible to improve the prediction of an increased LAP burden based on clinical factors alone.