Abstract
To support coordinated air quality management, this study developed a tree-based machine learning framework for multi-pollutant forecasting. We systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Decision Tree (DT) models for six key pollutants: PM(2.5), PM(10), NO(2), SO(2), CO, and O(3), using high-resolution environmental monitoring data (10 km resolution) from China’s four major municipalities (2021–2024). A comprehensive feature system was constructed incorporating meteorology-emission interaction terms. SHapley Additive exPlanations (SHAP) values were employed to quantify feature contributions. Key findings demonstrate: (1) RF achieved optimal performance in particulate matter prediction (PM(2.5): R2 = 0.99, RMSE = 0.11 µg/m(3); PM(10): R(2) = 0.98); (2) GBDT showed comparable accuracy to RF for NO2 (R(2) = 0.85) and CO (R(2) = 0.98) with minimal differences (ΔR(2) ≤ 0.03); (3) DT exhibited competitive O(3) prediction capability (R(2) = 0.88). SHAP analysis revealed critical mechanisms, such as CO’s positive synergistic effect (SHAP = 0.136) in PM(2.5) prediction and O(3) generation sensitivity to temperature (SHAP = 0.076). This research provides an interpretable, multi-pollutant forecasting framework applicable to urban air quality warning systems and offers model selection guidance for environmental regulation strategies.