Time-aware forecasting of search volume categories and actual purchase

基于时间感知的搜索量类别和实际购买预测

阅读:1

Abstract

The new e-commerce field has attracted businesses of all sizes, retailers, and individuals. Consequently, there is an ongoing necessity for applications that can offer predictions on trending products and optimal selling time. This research suggests aiding businesses in forecasting demand for various product categories by employing data mining algorithms on multivariate time series data. To ensure the most recent information, real-time data was gathered through APIs to build the first block in this research. While search volume was derived from the Keywords Everywhere tool, Amazon's search volume was derived from the Helium 10 tool and external features about actual purchased data. The harvested raw datasets went through multiple processes to generate the dataset and were validated. The models XGBoost, Linear Regression, Random Forest, long-short-term memory, and K-nearest neighbor were employed to predict the trends, and the performance is demonstrated using evaluation metrics, namely Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2). Overall, Linear Regression outperformed, especially at a correlation coefficient of 0.9, with R2 = 90.688, MAE = 0.038, MSE = 0.003, and RMSE = 0.057. KNN outperformed on correlation coefficient of 0.7, R2 = 85.129, MAE = 0.045, MSE = 0.005, and RMSE = 0.068. XGBoost produced the best results with a correlation coefficient of 0.9, yielding R2 = 85.89, MAE = 0.042, MSE = 0.004, and RMSE = 0.062. Random Forest, on the other hand, achieves peak metrics with a correlation coefficient of 0.6, R2 = 84.854, MAE = 0.041, MSE = 0.004, and RMSE = 0.066.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。