An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data.

In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models' predictive performance to different hyperparameters settings using 3-D interpolated surfaces. In our experiment, the model that incorporates the highest degree of nonlinearity (Gaussian Kernel) had the best in-sample performance, but also yielded the worst out-of-sample predictions, a typical example of overfitting in a machine learning model. On the other hand, the linear Kernel function performed badly in-sample but generated the best out-of-sample forecasts. The findings of this paper provide an empirical assessment of fundamental concepts in data analysis and evidence the need for caution when applying machine learning models to support real-world decision making, notably with respect to the challenges arising from the COVID-19 pandemics.

期刊：	Chaos Solitons & Fractals	影响因子：	5.600
时间：	2020	起止号：	2020 Oct;139:110055
doi：	10.1016/j.chaos.2020.110055

An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data.

特别声明