Abstract
Air pollution is a global problem that threatens environmental sustainability and severely affects public health. Monitoring air quality and predicting future pollution levels are critical for creating effective environmental policies and enabling individuals to take precautions against air pollution. This study presents a long-term assessment of daily Air Quality Index (AQI) prediction using machine learning models based on meteorological and pollutant data collected in eastern Türkiye from 2016 to 2024. The dataset includes four major air pollutants (PM₁₀, SO₂, NO₂, O₃) and five meteorological variables (temperature, precipitation, relative humidity, wind direction, wind speed). Three models-eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Support Vector Machine (SVM)-were evaluated using the coefficient of determination (R²), root mean square error (RMSE) and mean absolute error (MAE) as performance metrics. Among these, XGBoost achieved the highest prediction accuracy (R² = 0.999, RMSE = 0.234, MAE = 0.158). The results demonstrate that ensemble-based machine learning approaches, particularly XGBoost, can effectively model AQI fluctuations using environmental predictors. These results provide valuable insights for air quality forecasting systems and suggest practical implications for regional air pollution management and early warning systems, supporting public health protection and the development of environmental health policies.