Optimizing lipocalin sequence classification with ensemble deep learning models

利用集成深度学习模型优化脂质运载蛋白序列分类

阅读:1

Abstract

Deep learning (DL) has become a powerful tool for the recognition and classification of biological sequences. However, conventional single-architecture models often struggle with suboptimal predictive performance and high computational costs. To address these challenges, we present EnsembleDL-Lipo, an innovative ensemble deep learning framework that combines Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) to enhance the identification of lipocalin sequences. Lipocalins are multifunctional extracellular proteins involved in various diseases and stress responses, and their low sequence similarity and occurrence in the 'twilight zone' of sequence alignment present significant hurdles for accurate classification. These challenges necessitate efficient computational methods to complement traditional, labor-intensive experimental approaches. EnsembleDL-Lipo overcomes these issues by leveraging a set of PSSM-based features to train a large ensemble of deep learning models. The framework integrates multiple feature representations derived from position-specific scoring matrices (PSSMs), optimizing classification performance across diverse sequence patterns. The model achieved superior results on the training dataset, with an accuracy (ACC) of 97.65%, recall of 97.10%, Matthews correlation coefficient (MCC) of 0.95, and area under the curve (AUC) of 0.99. Validation on an independent test set further confirmed the robustness of the model, yielding an ACC of 95.79%, recall of 90.48%, MCC of 0.92, and AUC of 0.97. These results demonstrate that EnsembleDL-Lipo is a highly effective and computationally efficient tool for lipocalin sequence identification, significantly outperforming existing methods and offering strong potential for applications in biomarker discovery.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。