Identification of gene 3' ends by automated EST cluster analysis

利用自动化EST聚类分析鉴定基因3'末端

阅读:1

Abstract

The properties and biology of mRNA transcripts can be affected profoundly by the choice of alternative polyadenylation sites, making definition of the 3' ends of transcripts essential for understanding their regulation. Here we show that 22-52% of sequences in commonly used human and murine "full-length" transcript databases may not currently end at bona fide polyadenylation sites. To identify probable transcript termini over the entire murine and human genomes, we analyzed the EST databases for positional clustering of EST ends. The analysis yielded 58,282 murine- and 86,410 human-candidate polyadenylation sites, of which 75% mapped to 23,091 known murine transcripts and 22,891 known human transcripts. The murine dataset correctly predicted 97% of the 3' ends in a manually curated and experimentally supported benchmark transcript set. Of currently known genes, 15% had no associated prediction and 25% had only a single predicted termination site. The remaining genes had an average of 3-4 alternative polyadenylation sites predicted for each murine or human transcript, respectively. The results are made available in the form of tables and an interactive web site that can be mined for rapid assessment of the validity of 3' ends in existing collections, enumeration of potential alternative 3' polyadenylation sites of known transcripts, direct retrieval of terminal sequences for design of probes, and detection of polyadenylation sites not currently mapped to known genes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。