Cyclical hybrid imputation technique for missing values in data sets

循环混合插补技术用于数据集中的缺失值

阅读:1

Abstract

The problem of missing data in data sets is the most important first step to be addressed in the preprocessing phase. Because incorrect imputation of missing data increases the error in the modeling phase and reduces the prediction performance of the model. When it comes to health, it is inevitable to choose models that show a higher success rate. In cases where there is missing data, the performance of machine learning models may differ depending on the amount of data contained in the data set. The presence of missing data and this high rate affects the accuracy and reliability of analysis and modeling studies because it will affect the complete amount of data in the data set. Estimating and filling in the missing data very precisely, close to its real value, will provide a significant visible performance increase in the modeling phase, which is the next stage. After imputing the missing data with an artificial intelligence model rather than a random method, it is obvious that the accuracy of the model trained with this data is higher than the model trained with data filled with classical filling methods such as mean and mode. In this study, we propose a new algorithm that has been tested on many datasets to address the problems caused by missing data imputation in the dataset. The algorithm aims to impute missing values more effectively by using row-based and column-based imputation techniques together and cyclically. The algorithm takes into account individual missing values using column-based imputation features and the overall data structure using row-based imputation features. The proposed algorithm achieved 100% accuracy with some row and column-based imputation techniques on 3 different datasets used in the study. Higher accuracy was achieved compared to other imputation techniques.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。