Abstract
Multiphysics, multiscale climate models, such as the Energy Exascale Earth System Model (E3SM) generate massive volumes of data over extended time periods to support long-term climate analysis. Data compression methods, both lossy and lossless, have been extensively used to manage these datasets. Implicit Neural Representation (INR) has recently emerged as a promising lossy compression technique. While INRs offer good compression rates, they often suffer from reconstruction errors that may impede downstream climatic analysis. To address this, we propose a Context-Aware Implicit Neural Representation (CA-INR), which is based on a multi-layer perceptron (MLP) architecture and takes both spatiotemporal coordinates and auxiliary physical variables, referred to as context, as inputs. The model is trained to memorize the data with the explicit goal of overfitting, thereby enabling accurate reconstruction of the original data. The inclusion of context allows the model to better capture the underlying structures and correlations in Earth system data. We evaluate different architectures of CA-INR using the surface temperature variable from the E3SM dataset and investigate the impact of incorporating different types, qualities, and numbers of contextual inputs, specifically, topography, mean climatological temperature, and their combination, on compression gain and reconstruction error. Our results demonstrate that incorporating contextual information reduces reconstruction error while maintaining a high compression rate, outperforming standard INR models. The resulting increase in peak signal-to-noise ratio (PSNR) is substantial, elevating the reconstructed data quality with CA-INR to a level suitable for downstream climate analysis.