Abstract
BACKGROUND: Plant phenomics has made significant progress recently, with new demand to move from external characterization to internal exploration through data combination. Hyperspectral and metabolomic data, with cause-and-effect relationship, are given priority for integration. However, few efficient integrating methods are available. RESULTS: Here, we showed the way to explore hyperspectral data through combining with upper-level metabolomic data and perform higher-level-data-guided dimension reduction in target-trait-oriented manner to obtain high analysis efficiency. To verify its feasibility, two-stage pipeline combining hyperspectral and metabolic data was designed to discriminate salt-tolerant phenotype for Medicago truncatula mutants. Centered on salt tolerance, data are combined through constructing metabolite-based spectral indices outlining tolerance-related metabolic changes in primary screening, and models converting hyperspectral data to metabolite content for detailed characterizing in secondary screening. Target phenotype could be discriminated after five-day salt-treatment, much earlier than phenotypic difference appearance. 20 mutants with salt-tolerant phenotype were successfully identified from about 1000 mutants, almost tripled that of unintegrated analysis. Accuracy rate, confirmed with salt-tolerance analysis for experimental verification, reached 90 %, which can be optimized to 100 % theoretically utilizing results from hierarchical-clustering-assisted Principal Component Analysis. CONCLUSIONS: Mutant-screening pipeline provided here is a practical example for targeted data integration and data mining under the guide of upper-layer omic data. Targeted combination of phenomic and metabolomic data provides the ability for accurate phenotype discrimination and prediction from both external and internal aspects, providing a powerful tool for phenotype selection in new-generation crop breeding.