Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning

利用机器学习对内罗毕两阶段癫痫患病率调查中样本流失的决定因素进行建模

阅读:1

Abstract

BACKGROUND: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. METHODS: All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. RESULTS: Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. CONCLUSION: These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。