A machine learning model for predicting congenital heart defects from administrative data

基于行政数据预测先天性心脏缺陷的机器学习模型

阅读:1

Abstract

INTRODUCTION: International Classification of Diseases (ICD) codes recorded in administrative data are often used to identify congenital heart defects (CHD). However, these codes may inaccurately identify true positive (TP) CHD individuals. CHD surveillance could be strengthened by accurate CHD identification in administrative records using machine learning (ML) algorithms. METHODS: To identify features relevant to accurate CHD identification, traditional ML models were applied to a validated dataset of 779 patients; encounter level data, including ICD-9-CM and CPT codes, from 2011 to 2013 at four US sites were utilized. Five-fold cross-validation determined overlapping important features that best predicted TP CHD individuals. Median values and 95% confidence intervals (CIs) of area under the receiver operating curve, positive predictive value (PPV), negative predictive value, sensitivity, specificity, and F1-score were compared across four ML models: Logistic Regression, Gaussian Naive Bayes, Random Forest, and eXtreme Gradient Boosting (XGBoost). RESULTS: Baseline PPV was 76.5% from expert clinician validation of ICD-9-CM CHD-related codes. Feature selection for ML decreased 7138 features to 10 that best predicted TP CHD cases. During training and testing, XGBoost performed the best in median accuracy (F1-score) and PPV, 0.84 (95% CI: 0.76, 0.91) and 0.94 (95% CI: 0.91, 0.96), respectively. When applied to the entire dataset, XGBoost revealed a median PPV of 0.94 (95% CI: 0.94, 0.95). CONCLUSIONS: Applying ML algorithms improved the accuracy of identifying TP CHD cases in comparison to ICD codes alone. Use of this technique to identify CHD cases would improve generalizability of results obtained from large datasets to the CHD patient population, enhancing public health surveillance efforts.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。