Abstract
BACKGROUND: Gene expression profiles hold potentially valuable information for the prediction of breeding values and phenotypes. However, in practical breeding programs, most reference population individuals typically have only genomic data, lacking transcriptomic data. Predicting gene expression based on genetic markers and integrating the genetically predicted gene expression data into genomic prediction may offer a potential solution. RESULTS: This study extends kernel ridge regression (KRR) to weighted multiple kernel ridge regression (WMKRR), which integrates genomic data and transcriptomic data predicted from genetic markers through a multiple kernel learning (MKL) approach. We evaluated the predictive ability of WMKRR compared to traditional genomic best linear unbiased prediction (GBLUP) and a combined genomic and transcriptomic best linear unbiased prediction (GTBLUP) in both genotype feature selection and non-feature selection scenarios in two datasets: (i) 3305 simulated data based on the Cattle Genotype-Tissue Expression (CattleGTEx) dataset, (ii) 5515 real dairy cattle data. Our results show that WMKRR yielded higher predictive abilities than GBLUP And GTBLUP in both simulated And real dairy cattle data. For the simulated data based on CattleGTEx, WMKRR achieved an average improvement in predictive ability of 1.12% And 1.13% over GBLUP And GTBLUP, respectively, under the non-feature selection scenario, And 3.17% And 3.23%, respectively, under the feature selection scenario. For the real dairy cattle data, in cross-validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.56% And 7.23%, respectively, without feature selection, And by 5.66% And 6.40%, respectively, with feature selection. In forward validation, WMKRR improved over GBLUP And GTBLUP by An average of 5.68% And 8.41%, respectively, without feature selection, And by 4.66% And 7.06%, respectively, with feature selection. CONCLUSIONS: Our result demonstrates that the WMKRR model, which integrates genomic and genetically predicted transcriptomic data, achieves better prediction performance compared to traditional genomic prediction models. This study showed the potential of enhanced genomic breeding application using omics data with no further omics sequencing cost.