Structural and Functional Classification of G-Quadruplex Families within the Human Genome

人类基因组中G-四链体家族的结构和功能分类

阅读:1

Abstract

G-quadruplexes (G4s) are short secondary DNA structures located throughout genomic DNA and transcribed RNA. Although G4 structures have been shown to form in vivo, no current search tools that examine these structures based on previously identified G-quadruplexes and filter them based on similar sequence, structure, and thermodynamic properties are known to exist. We present a framework for clustering G-quadruplex sequences into families using the CD-HIT, MeShClust, and DNACLUST methods along with a combination of Starcode and BLAST. Utilizing this framework to filter and annotate clusters, 95 families of G-quadruplex sequences were identified within the human genome. Profiles for each family were created using hidden Markov models to allow for the identification of additional family members and generate homology probability scores. The thermodynamic folding energy properties, functional annotation of genes associated with the sequences, scores from different prediction algorithms, and transcription factor binding motifs within a family were used to annotate and compare the diversity within and across clusters. The resulting set of G-quadruplex families can be used to further understand how different regions of the genome are regulated by factors targeting specific structures common to members of a specific cluster.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。