A hybrid framework of statistical, machine learning, and explainable AI methods for school dropout prediction

一种结合统计学、机器学习和可解释人工智能方法的混合框架,用于预测学校辍学率

阅读:1

Abstract

Student dropout is a significant challenge in Bangladesh, with serious impacts on both educational and socio-economic outcomes. This study investigates the factors influencing school dropout among students aged 6-24 years, employing data from the 2019 Multiple Indicator Cluster Survey (MICS). The research integrates statistical analysis with machine learning (ML) techniques and explainable AI (XAI) to identify key predictors and enhance model interpretability. Initially, descriptive and inferential statistical analyses were applied to identify significant predictors and guide feature selection. The hybrid feature selection strategy, combining statistical significance and model-based importance measures, revealed key features. Logistic regression was applied to identify statistically significant predictors of school dropout. ML algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGB), were used to build predictive models. Model performance was evaluated using accuracy, precision, recall, and an F1 score. The XGB model achieved the best performance with an accuracy of 94.4%, followed by the RF model. To interpret model predictions and ensure transparency, SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations) were employed in tandem with the statistical analyses. Key factors influencing student dropout included age, sex, completed grade, last education grade, division, wealth index, father's and mother's education. These insights offer a data-driven foundation for policymakers to develop targeted intervention strategies aimed at reducing student dropout rates and improving educational outcomes in Bangladesh.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。