Bioinformatic insights into five Chinese population substructures inferred from the East Asian-specific AISNP panel

基于东亚特异性AISNP面板推断的五个中国人群亚结构的生物信息学见解

阅读:1

Abstract

BACKGROUND: Recent advances in population-specific high-quality reference databases have significantly improved the performance of forensic panel development for personal identification, parentage testing, and biogeographical ancestry inference. However, the discriminative power of previously developed AISNP panels remains limited in applications involving regional Chinese population substructures. RESULTS: We used the high-quality Chinese population-specific genetic resource to develop six nested C5ClusterTag-50/100/250/500/1000/2000 ancestry-informative SNP panels focused on inferring population stratification among geographically and genetically distinct Chinese populations. We used comprehensive bioinformatics approaches and machine learning to validate the effectiveness of these panels in both the testing and training datasets. A total of 2,772 individuals were screened across different AISNP panels based on the I(n) value. Ancestry inference power was assessed via principal component analysis (PCA), ADMIXTURE, and uniform manifold approximation and projection (UMAP) across the 2000-AISNP, 1000-AISNP, 500-AISNP, 250-AISNP, 100-AISNP, and 50-AISNP panels. The 1000-AISNP panel demonstrated the optimal balance between the SNP count and discriminative power for forensic applications. The accuracy of the random forest model was confirmed through a confusion matrix based on machine learning. CONCLUSIONS: These panels can differentiate ethnolinguistic Chinese populations into five subgroups based on geographical divisions or linguistic affiliations, achieving a high average accuracy rate in machine learning models. This work not only developed a robust ancestry inference panel and new tools for predicting the ancestry of ethnolinguistic Chinese populations but also created a comprehensive reference dataset and machine learning model applicable to population and forensic uses globally.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。