Machine Learning Algorithms to Predict Heavy Episodic Drinking in the United States Using Survey Data

利用调查数据,通过机器学习算法预测美国人群的重度间歇性饮酒行为

阅读:1

Abstract

INTRODUCTION: Heavy episodic drinking (HED) is a major public health concern but is often missing from surveys or measured unreliably. Predictive models offer a method to estimate HED's likelihood at the individual level in such cases. While logistic regression is commonly used, other machine learning algorithms (MLA) may offer greater accuracy and robustness. This study compares various MLAs to identify the best predictive model of HED. METHODS: Data from the 1997-2018 National Health Interview Survey were used. Six MLAs were trained and cross-validated: logistic regression, naïve bayes, k-nearest neighbour, support vector machine, random forest and XGBoost. Model performance was compared, and the SHapley Additive exPlanations (SHAP) method assessed interpretability by ranking features based on their contribution to the model's prediction. RESULTS: The probability of correctly ranking a randomly selected HED instance higher than a non-HED instance ranged from 0.85 to 0.97 (with values closer to 1 indicating better performance). XGBoost outperformed the other MLAs (sensitivity 0.80, precision 0.83, accuracy 0.92). Amongst the 11 features included in the models, average daily alcohol use and age were the most influential, as determined by SHAP values. DISCUSSION AND CONCLUSIONS: The strong discriminative ability of our models shows that even a limited number of well-chosen features can yield robust predictions, highlighting the potential of MLAs for modelling health behaviours. Integrating our models into simulation frameworks can help model HED and test scenarios, leading to effective policies. Future studies should incorporate objective sources for external validation and investigate systematic biases to improve predictive accuracy.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。