Global or local modeling for XGBoost in geospatial studies upon simulated data and German COVID-19 infection forecasting

基于模拟数据的地理空间研究中 XGBoost 的全局或局部建模以及德国 COVID-19 感染预测

阅读:1

Abstract

Methods from artificial intelligence (AI) and, in particular, machine learning and deep learning, have advanced rapidly in recent years and have been applied to multiple fields including geospatial analysis. Due to the spatial heterogeneity and the fact that conventional methods can not mine large data, geospatial studies typically model homogeneous regions locally within the entire study area. However, AI models can process large amounts of data, and, theoretically, the more diverse the train data, the more robust a well-trained model will be. In this paper, we study a typical machine learning method XGBoost, with the question: Is it better to build a single global or multiple local models for XGBoost in geospatial studies? To compare the global and local modeling, XGBoost is first studied on simulated data and then also studied to forecast daily infection cases of COVID-19 in Germany. The results indicate that if the data under different relationships between independent and dependent variables are balanced and the corresponding value ranges are similar, i.e., low spatial variation, global modeling of XGBoost is better for most cases; otherwise, local modeling of XGBoost is more stable and better, especially for the secondary data. Besides, local modeling has the potential of using parallel computing because each sub-model is trained independently, but the spatial partition of local modeling requires extra attention and can affect results.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。