Abstract
Traditional breeding programs have largely focused on genetics, often overlooking environmental and epigenetic influences on phenotypic variability. Current methods for developing epigenetic biomarkers (EBs) with machine learning (ML) algorithms require extensive data, making them costly and time-intensive. In this study, using a fish as a model, we analysed ~500 000 CpG loci in samples from 60 different families to develop EBs for broodstock selection. To address limited sample sizes at the sequencing stage, we combined careful sample selection, statistical filtering, and various feature selection and ML algorithms. As a result, we identified three heritable CpGs sites in sire sperm associated with three key performance indicators in their offspring: biomass, fast-growing females, and resistance to the masculinizing effects of high temperature. Then, we were able to build a model successfully predicting the best sire broodstock based on DNA methylation levels of these EBs. This model was validated across three independent trials, including one involving an external cohort of fish with differentiated genetic background, thereby confirming its robustness beyond the training population. Yield was increased up to 1.4-fold when including epigenetic selection into the genetic selection program as compared with genetic selection alone. In summary, we present a cost-effective strategy for integrating epigenetic and genetic selection in the context of animal production. Furthermore, this method also can be applied to assess the impact of environmental factors into the broodstock and on samples where obtaining information can be challenging, such as in the study of the epigenetic basis of rare diseases, and the application of epigenetic markers in conservation biology.