Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings

Deep-STP:一种基于深度学习的利用词嵌入预测蛇毒素蛋白的方法

阅读:1

Abstract

Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely g-gap, natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。