Abstract
Genomic prediction using whole-genome sequencing (WGS) data is challenged by the imbalance between a limited sample size (n) and an extensive number of single-nucleotide polymorphisms (SNPs) (p), where n ≪p. The high dimensionality of WGS data also increases computational demands, limiting its practical application. In this study, we introduce DAGP, a novel method that integrates deep autoencoder compression to reduce WGS data dimensionality by over 99% while preserving essential genetic information. This compression significantly improves computational efficiency, facilitating the effective use of high-dimensional genomic data. Our results demonstrated that DAGP, when combined with the genomic best linear unbiased prediction (GBLUP) method, maintained prediction accuracy comparable to WGS data, even at reduced marker densities of 50 K for sturgeon and 20 K for maize. Furthermore, integrating DAGP with Bayesian and machine learning models improved genomic prediction accuracy over traditional WGS-based GBLUP, with an average gain of 6.05% and 5.35%, respectively. DAGP provides an efficient and scalable solution for genomic prediction in species with large-scale genomic data, offering both computational feasibility and enhanced prediction performance.