CaTCH: Calculating transcript complexity of human genes

CaTCH:计算人类基因的转录本复杂性

阅读:1

Abstract

The findings based on whole transcriptome sequencing suggest that alternative splicing occurs in approximately 95% of human multi-exon genes, thus, playing a crucial role in promoting proteome diversity. According to the latest GENCODE annotations, most genes have less than four transcripts, positively correlating with the number of exons. Thus, it is more accurate to measure the splice variant efficiency of a gene with respect to the number of exons, which is a measure of Transcript Complexity (TC). In addition to that, the theoretical number of transcripts is substantially higher than the actual number of transcripts produced by Alternative Splicing Events, and the features restricting this phenomenon need to be explored. In this method, we have extracted the data of various features contributing to TC from different databases. Linear regression is used to identify the determinant features and to train and test the model of TC. The results indicate that exon length is the determining feature of TC, followed by coding potential, presence of chromatin signature, and 5' splice site dinucleotide, all of which negatively affect a gene's TC, except exon length. To further classify the genes based on TC, random forest is used to identify the determinant features.•The splicing efficiency of a gene can be inferred by the transcript complexity, which is the number of transcripts per exon.•CaTCH is a linear regression-based model to calculate the transcript complexity of human genes, which can be calculated from the exon length, coding potentiality, presence of chromatin signature/s, and 5' splice site dinucleotide.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。