Abstract
Validation of genomic predictions or polygenic risk scores is key for model selection and evaluating the performance of the chosen prediction machinery. Non-parametric validation, such as cross-validation, is popular but does not account for population structure and the fact that the interest could be in validating a set of individuals and not the entire population. Semi-parametric methods, such as the LR method, also use removed records to validate predictions, account for population structure, and allow focus on a specific set of individuals of interest. Confidence intervals are obtained using semi-parametric methods without the need for repeated cross-validation. We developed a tool within the Blupf90 software suite, called validationf90, that allows researchers to conduct semi-parametric validation from the solutions obtained from that software suite. validationf90 calculates different validation statistics and their confidence intervals for a pre-defined set of individuals of interest, reflecting the bias and accuracy of genomic predictions. The program allows for genomic predictions obtained from frequentist and Bayesian methods, as well as for categorical data. validationf90 can validate any model supported by the Blupf90 software suite and can be used with animal, plant, and human datasets. Predictions obtained with other software can be provided to validationf90 as long as the input format matches with the Blupf90 format.