Abstract
Accurate risk estimation under distribution shifts is critical for deploying machine learning models in real-world spatial applications, from ecological forecasting to medical image analysis. Conventional methods such as No Weighting (NW) and Importance Weighting (IW) fail in spatially structured data due to two challenges: (1) density ratio estimation in high-dimensional clustered distributions and (2) non-stationarity from environmental gradients or sampling biases. Classifier-based approaches offer partial improvements but often yield miscalibrated risk estimates by prioritizing discriminative accuracy over distribution alignment. We conduct a systematic evaluation of four risk estimation methods -NW, IW, Kernel Mean Matching (KMM), and classifier-based reweighting-across synthetic benchmarks (with controlled spatial clustering) and real-world datasets (species distributions and immune cell layouts). Results show that KMM achieves superior robustness, reducing Mean Absolute Percentage Error (MAPE) by 12.3-86.5% compared to alternatives in high-dimensional settings. This advantage stems from KMM's direct minimization of distributional divergence via kernel embeddings, bypassing error-prone density ratio estimation. Our findings demonstrate that KMM is a principled solution for spatial risk estimation, particularly when source and target distributions exhibit complex clustering or sampling artifacts. Its consistency across ecological and biomedical domains suggests broad applicability for reliable model deployment in spatially heterogeneous environments.