Abstract
BACKGROUND: Genomic selection relies on a variety of statistical and machine learning methods to predict phenotypes from genomic data. Since no single method consistently outperforms others across datasets, evaluating and comparing model performance is essential. However, standard evaluation metrics such as Pearson's correlation coefficient and mean squared error treat genomic prediction as a regression problem, assessing overall fit rather than the effectiveness of selecting top-performing individuals for breeding. This disconnect can lead to suboptimal model selection in practice. RESULTS: To address this, we present the normalized cumulative gain (NCG) as an alternative evaluation measure that directly measures the phenotypic gain achieved from the individuals selected by the model. We applied this measure on four animal and plant datasets to compare nine commonly used methods for genomic prediction. CONCLUSIONS: NCG offers an intuitive and interpretable measure of selection efficiency, focusing solely on the individuals that would actually be chosen. We further demonstrate that calculating the performance under all possible selection thresholds provides more information than a single or few arbitrary thresholds. This more granular analysis shows that the performance of the methods may differ under varying selection intensities and can provide guidance for the choice of selection intensity. Our approach is implemented in R and is available at https://github.com/FelixHeinrich/GS_Comparison_with_NCG .