FusDRM-m5C: a hybrid model for accurate prediction of 5-methylcytosine modification sites based on feature fusion and attention mechanism

FusDRM-m5C:一种基于特征融合和注意力机制的5-甲基胞嘧啶修饰位点精确预测混合模型

阅读:2

Abstract

INTRODUCTION: The precise identification of 5-methylcytosine (m5C), an epitranscriptomic modification fundamental to RNA function, is crucial yet proves difficult to achieve experimentally. Consequently, computational prediction offers a promising avenue; however, refining its predictive accuracy and ensuring its robustness remain ongoing objectives. To address these limitations, this study introduces a deep learning framework designed for highly accurate m5C site prediction from RNA sequences. METHODS: We propose FusDRM-m5C, a deep learning framework featuring a multi-branch architecture designed to process three distinct feature types: one-hot vector representation (one-hot), Z-curve-based geometrical features (Z-curve), and local RNA secondary structure (RSS). Each feature type is processed by a separate, parallel branch. Within each branch, a Dilated Convolutional Neural Network (DCNN) captures multi-scale patterns, followed by a Multi-Head Self-Attention (MHSA) mechanism with residual connections to weigh context-dependent information. For feature fusion, the high-level representations from the three branches are then integrated via concatenation. This fused feature vector is subsequently fed into a final fully connected network, which generates the prediction probability for precise m5C site identification. RESULTS: The performance of FusDRM-m5C was rigorously evaluated using both 5-fold cross-validation (CV) and independent dataset testing. On the 5-fold CV benchmark dataset, the model achieved high predictive accuracy, reflected by a Sensitivity (Sn) reaching 0.995, Specificity (Sp) of 0.971, Accuracy (ACC) at 0.983, Matthews correlation coefficient (MCC) measuring 0.966, and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.997. Crucially, when assessed on an independent test dataset, the model maintained strong generalization capability, attaining an Sn of 0.900, Sp of 0.965, Acc of 0.933, MCC of 0.867, and an AUC of 0.986. Furthermore, we assessed the cross-species prediction performance of FusDRM-m5C. The results demonstrated that the model consistently maintained high accuracy and robustness across datasets from multiple species, outperforming several existing state-of-the-art methods. DISCUSSION: The proposed FusDRM-m5C model demonstrates highly accurate and robust prediction of m5C sites, comparing favorably with existing methods. Its architecture effectively integrates diverse biological features through distinct processing pathways fused via attention, offering a powerful tool for m5C identification.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。