Abstract
Accurate prediction of protein and peptide functions from amino acid sequences is essential for understanding biological processes and advancing biomolecular engineering. Due to the limitations of experimental methods, computational approaches, particularly machine learning, have gained significant attention. However, many existing tools are task-specific and lack adaptability. Here, we propose a BERT-BiLSTM-Attention-TCN Protein Function Prediction Framework (BBATProt), a versatile framework for predicting protein and peptide functions. BBATProt leverages transfer learning with a pretrained bidirectional encoder representations from transformer model to capture high-dimensional features. The custom network integrates bidirectional long short-term memory and temporal convolutional network to align with proteins' spatial characteristics, combining local and global feature extraction via attention mechanisms to achieve more precise predictions. Evaluations demonstrate that BBATProt consistently outperforms state-of-the-art models in tasks such as hydrolytic catalysis, peptide bioactivity, and post-translational modification (PTM) site prediction. Specifically, BBATProt improves accuracy by 2.96%-41.96% in antimicrobial peptide (AMP) prediction and by 0.64%-23.54% in PTM prediction tasks. In terms of area under the receiver operating characteristic curve, improvements range from 0.71% to 40.51% for AMP prediction and 0.62%-27.82% for PTM prediction. Visualizations of feature evolution and refinement via attention mechanisms validate the framework's interpretability, providing transparency into the feature-extraction process and offering deeper insights into the basis of property prediction.