iPro2L-Kresidual: A High-Performance Promoter Identification Model for Sequence Nonlinearity and Context Mining

iPro2L-Kresidual:一种用于序列非线性和上下文挖掘的高性能启动子识别模型

阅读:1

Abstract

A promoter is an important non-coding DNA sequence, as it can regulate gene expression. Its abnormalities are closely associated with various diseases, such as coronary heart disease, diabetes, and tumors. Therefore, promoter identification is highly significant. Due to the insufficient nonlinear feature extraction and insufficient capture of sequence context relationships, existing single promoter identification models have a lower classification performance. To overcome these shortcomings, this paper proposed a new model called iPro2L-Kresidual. iPro2L-Kresidual integrated a residual structure with a KAN network to design a novel Kresidual module. The Kresidual module significantly enhanced the nonlinear expression capability of sequence features by using B-spline functions and residual networks. Additionally, to fully capture the sequence context relationship, iPro2L-Kresidual improved a Transformer encoder module by replacing the linear processing method with gated recurrent units, so then it can extract both local and global context features of a sequence. Furthermore, iPro2L-Kresidual designed a regularized label smoothing cross-entropy loss function to ensure training stability and prevent the model from becoming overly confident. Experimental results on 5-fold cross-validation showed that the accuracy of promoter identification and promoter strength identification, respectively, was 94.28% and 90.55%. Moreover, on an independent dataset, the prediction accuracy reached 93.13%, further demonstrating the model's strong generalization ability. This provides a novel and effective predictive model for promoter site prediction.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。