TransAC4C-a novel interpretable architecture for multi-species identification of N4-acetylcytidine sites in RNA with single-base resolution

TransAC4C——一种新型可解释架构,用于以单碱基分辨率识别RNA中N4-乙酰胞嘧啶位点的多物种方法。

阅读:1

Abstract

N4-acetylcytidine (ac4C) is a modification found in ribonucleic acid (RNA) related to diseases. Expensive and labor-intensive methods hindered the exploration of ac4C mechanisms and the development of specific anti-ac4C drugs. Therefore, an advanced prediction model for ac4C in RNA is urgently needed. Despite the construction of various prediction models, several limitations exist: (1) insufficient resolution at base level for ac4C sites; (2) lack of information on species other than Homo sapiens; (3) lack of information on RNA other than mRNA; and (4) lack of interpretation for each prediction. In light of these limitations, we have reconstructed the previous benchmark dataset and introduced a new dataset including balanced RNA sequences from multiple species and RNA types, while also providing base-level resolution for ac4C sites. Additionally, we have proposed a novel transformer-based architecture and pipeline for predicting ac4C sites, allowing for highly accurate predictions, visually interpretable results and no restrictions on the length of input RNA sequences. Statistically, our work has improved the accuracy of predicting specific ac4C sites in multiple species from less than 40% to around 85%, achieving a high AUC > 0.9. These results significantly surpass the performance of all existing models.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。