Applicability analysis of tree-based ensemble learning for air pollutant prediction models

基于树的集成学习在空气污染物预测模型中的适用性分析

阅读:3

Abstract

To support coordinated air quality management, this study developed a tree-based machine learning framework for multi-pollutant forecasting. We systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Decision Tree (DT) models for six key pollutants: PM(2.5), PM(10), NO(2), SO(2), CO, and O(3), using high-resolution environmental monitoring data (10 km resolution) from China’s four major municipalities (2021–2024). A comprehensive feature system was constructed incorporating meteorology-emission interaction terms. SHapley Additive exPlanations (SHAP) values were employed to quantify feature contributions. Key findings demonstrate: (1) RF achieved optimal performance in particulate matter prediction (PM(2.5): R2 = 0.99, RMSE = 0.11 µg/m(3); PM(10): R(2) = 0.98); (2) GBDT showed comparable accuracy to RF for NO2 (R(2) = 0.85) and CO (R(2) = 0.98) with minimal differences (ΔR(2) ≤ 0.03); (3) DT exhibited competitive O(3) prediction capability (R(2) = 0.88). SHAP analysis revealed critical mechanisms, such as CO’s positive synergistic effect (SHAP = 0.136) in PM(2.5) prediction and O(3) generation sensitivity to temperature (SHAP = 0.076). This research provides an interpretable, multi-pollutant forecasting framework applicable to urban air quality warning systems and offers model selection guidance for environmental regulation strategies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。