The iterated score regression estimation algorithm for PCA-based missing data with high correlation

针对基于PCA且缺失数据高度相关的数据,提出了一种迭代得分回归估计算法。

阅读:1

Abstract

To handle principal component analysis (PCA)-based missing data with high correlation, we propose a novel imputation algorithm to impute missing values, called iterated score regression. The procedure is first to draw into a transformation matrix, which puts missing values and observed values into two data blocks, and then by using the data blocks, the score matrix, and PCA model to construct the related regression equations. The estimation update at the iteration is highlighted. We examine the sensitivity of the proposed algorithm, including the effects of standard deviations, correlation coefficients, missing proportions, variable numbers, and sample sizes with different intervals of the standard deviations and correlation coefficients. To compare some existing algorithms, we suggest the modifications of three popularly used algorithms that are also used to deal with missing data but are not highly correlated. In the numerical studies we conducted, the MSE values of the algorithm, to show its stability and accuracy, are always the smallest among the competitors we consider. It also shows the advantage, as the illustration, for three real missing data sets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。