Abstract
To address the challenge of accurately predicting carbon price fluctuations, which are influenced by multiple factors, a multisource, spatiotemporal, federated learning framework with cross-modal feature fusion is proposed. Firstly, a three-level hierarchical federated learning network, consisting of perception clients, regional (edge) nodes, and a central server, is designed. The server incrementally aggregates the parameters generated by the local large model of the perception client through incremental data training, improving the efficiency of parameter aggregation in federated learning and avoiding the problem of network traffic data exposure. Secondly, a cross-modal, spatiotemporal, enhanced attention model is proposed. In order to extract the joint features of carbon price time series data and spatial correlation, spatiotemporal feature encoding is adopted. In order to share the semantic space of aligning market factors and carbon emission data in the embedding layer, cross-modal alignment is adopted. Finally, the experimental results demonstrate that the proposed framework can effectively predict carbon prices.