Abstract
Air pollution monitoring is essential for urban environmental management. However, traditional approaches, such as ground-based stations and satellite remote sensing, are constrained by high costs, limited spatial or temporal resolution, and poor nighttime applicability. This study develops a unified convolutional-recurrent neural network (CNN-RNN) framework that jointly learns spatial cues and temporal dynamics from surveillance image sequences to estimate the air quality index (AQI) under varying illumination, including night and twilight. Experimented on more than 28,000 hourly images from six sites in southern Kaohsiung, Taiwan, the unified model consistently surpasses single-image baselines across sites and time periods and improves performance in higher pollution categories. The same pipeline extends to PM2.5 and PM10 and adapts to other cities through fine-tuning with few labeled samples. These results indicate that the framework can support round-the-clock, accurate air quality sensing and enable scalable deployment in camera networks to complement conventional monitoring.