Species complex delimitations in the genus Hedychium: A machine learning approach for cluster discovery

姜属物种复合体界定:一种基于机器学习的聚类发现方法

阅读:2

Abstract

PREMISE: Statistical methods used by most morphologists to validate species boundaries (such as principal component analysis [PCA] and non-metric multidimensional scaling [nMDS]) are limiting because these methods are mostly used as visualization methods, and because the groups are identified by taxonomists (i.e., supervised), adding human bias. Here, we use a spectral clustering algorithm for the unsupervised discovery of species boundaries followed by the analysis of the cluster-defining characters. METHODS: We used spectral clustering, nMDS, and PCA on 16 morphological characters within the genus Hedychium to group 93 individuals from 10 taxa. A radial basis function kernel was used for the spectral clustering with user-specified tuning values (gamma). The goodness of the discovered clusters using each gamma value was quantified using eigengap, a normalized mutual information score, and the Rand index. Finally, mutual information-based character selection and a t-test were used to identify cluster-defining characters. RESULTS: Spectral clustering revealed five, nine, and 12 clusters of taxa in the species complexes examined here. Character selection identified at least four characters that defined these clusters. DISCUSSION: Together with our proposed character analysis methods, spectral clustering enabled the unsupervised discovery of species boundaries along with an explanation of their biological significance. Our results suggest that spectral clustering combined with a character selection analysis can enhance morphometric analyses and is superior to current clustering methods for species delimitation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。