Abstract
In high-dimensional linear regression, the dimension of variables is always greater than the sample size. In this situation, the traditional variance estimation technique based on ordinary least squares constantly exhibits a high bias even under sparsity assumption. One of the major reasons is the high spurious correlation between unobserved realized noise and several predictors. To alleviate this problem, a refitted cross-validation (RCV) method has been proposed in the literature. However, for a complicated model, the RCV exhibits a lower probability that the selected model includes the true model in case of finite samples. This phenomenon may easily result in a large bias of variance estimation. Thus, a model selection method based on the ranks of the frequency of occurrences in six votes from a blocked 3×2 cross-validation is proposed in this study. The proposed method has a considerably larger probability of including the true model in practice than the RCV method. The variance estimation obtained using the model selected by the proposed method also shows a lower bias and a smaller variance. Furthermore, theoretical analysis proves the asymptotic normality property of the proposed variance estimation.