Integration of inter-simple sequence repeats with machine learning approach for diversity analysis and authentication of Iranian cotton cultivars

将简单序列重复间区与机器学习方法相结合,用于伊朗棉花品种的多样性分析和鉴定

阅读:1

Abstract

Cotton (Gossypium hirsutum L.) has experienced extensive breeding in recent decades, leading to a narrowed genetic base that presents challenges for accurate germplasm differentiation and cultivar authentication. This study primarily addresses the lack of reliable, scalable, and interpretable tools for distinguishing closely related Iranian cotton cultivars. To overcome this limitation, the research integrates inter-simple sequence repeat (ISSR) markers with machine learning (ML) algorithms to evaluate genetic diversity and establish diagnostic criteria for cultivar identification. Eighteen commercial cultivars were genotyped using 14 ISSR primers and binary scored data (presence/absence of bands) were used to calculate genetic diversity parameters, including the observed number of alleles (Na), effective number of alleles (Ne), Shannon's information index (I), and expected heterozygosity (He) were calculated. Primers 13, 10, and 26 were identified as the most informative loci, yielding the highest values across diversity parameters. Unweighted Pair Group Method with Arithmetic Mean (UPGMA) clustering and principal coordinates analysis (PCoA) revealed five cultivar groups, with several accessions (e.g., Jahesh, Fakhr, Sahel) showing marked genetic distinctiveness. To enhance cultivar authentication, ISSR data were analyzed using ML classifiers. A decision tree model generated transparent band-based rules, while Random Forest feature selection highlighted key diagnostic loci (Primer24_525, Primer2_766). The combined framework achieved high classification accuracy and reproducibility, enabling reliable discrimination among closely related cultivars. These findings demonstrate the novelty and practical utility of integrating multilocus ISSR markers with ML for cultivar authentication, seed certification, and genetic resource management, while also highlighting previously underexplored genetic diversity that can inform cotton breeding programs in Iran.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。