Abstract
BACKGROUND: Prunus Avium L. dormancy is a complex physiological process that allows floral outbreaks to survive adverse winter conditions and resume favorable spring growth. Traditional phenological evaluations and agroclimatic models, although widely used, exhibit limited resolution and robustness over the years and cultivars. Epigenetic mechanisms, particularly DNA methylation, have emerged as critical regulators of dormancy transitions. However, the integration of methylation data with automatic learning tools (ML) for predictive modeling remains largely unexplored in perennial species. This study presents an integrative frame that combines whole-genome bisulfite sequencing and supervised ML to identify methylation markers at the cytosine and region level associated with specific dormancy stages in the sweet cherry. METHODS: DNA methylation data sets from three different experiments underwent classification using Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), complemented by SHapley Additive exPlanations (SHAP) for interpretability. The importance of the features was evaluated using the Integrated Model consensus in the RF, XGBoost, and SHAP metrics. RESULTS: The selection of features significantly improved the classification performance in the three-stages models (paradormancy, endodormancy, ecodormancy) and two-stages (endodormancy and ecodormancy). RF constantly exceeded XGBoost, achieving an accuracy of up to 97.1% in the two-stages scenario using informative cytosine level data. The SHAP analyses demonstrated that the selected feature effectively discriminated among stages of dormancy and revealed biologically significant epigenetic features. The key features were distributed not random throughout the genome, often colocalizing with transposable elements of long terminal repetition (LTR), particularly LTR/ty3-retrotransposons and LTR/copia families. Some features also co-localize with QTLs for chilling and heat requirement, flowering time and maturity date previously identified. CONCLUSIONS: This study highlights the usefulness of combining high-resolution methylation data with interpretable ML techniques to identify robust dormancy biomarkers. The enrichment of the features associated with dormancy within the transposable elements and the proximal regions of genes suggests an epigenetic regulation through the remodeling of chromatin mediated by TE. These findings contribute to a deeper understanding of dormancy mechanisms and offer a basis for the development of non-destructive tools based on methylation to improve phenological management in perennial fruit crops.