Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3

在模拟数据上训练的随机森林分类器能够对 Chr16p13.3 处的 α 珠蛋白区域的结构变异进行基于短读长的准确基因分型

阅读:6
作者:Nancy F Hansen, Xunde Wang, Mickias B Tegegn, Zhi Liu, Mateus H Gouveia, Gracelyn Hill, Jennifer C Lin, Temiloluwa Okulosubo, Daniel Shriner, Swee Lay Thein, James C Mullikin

Abstract

In regions where reads don't align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human cohorts. Furthermore, population allele frequencies produced by our methods across a wide set of ancestries agree more closely with previously-determined frequencies than those obtained using currently available genotyping software.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。