Abstract
Air pollution monitoring systems use distributed sensors that record dynamic environmental conditions, often producing large volumes of heterogeneous and stochastic data. Efficient aggregation of this data is essential for reducing communication overhead while maintaining the quality of information for decision making. In this paper, we propose an unsupervised learning approach for soft clustering of sensors in air pollution monitoring systems. Our method utilizes the Expectation-Maximization algorithm, which is an unsupervised machine learning method and probabilistic technique, to cluster sensors into distinct sets corresponding to normal and polluted zones. This clustering is driven by the need for a dynamic data transmission policy: sensors in polluted zones must intensify their operation for detailed monitoring, while sensors in clean zones can reduce reporting rates and transmit condensed data summaries to alleviate network load and conserve energy. The cluster membership probability enables a tunable trade-off between data redundancy and monitoring accuracy. The high efficiency of the proposed AI-based clustering is validated by the simulation results. Under common pollution scenarios and with adequate sample sizes, the EM algorithm exhibits a relative error below 5%. The presented approach provides a foundation for a wide range of intelligent and adaptive data aggregation protocols.