Abstract
BACKGROUND: Accurate prediction of the neurotoxicity of peptides and proteins is critically important for the safety assessment of protein therapeutics and the development of protein-based drugs. Although experimental methods can reliably identify neurotoxic peptides and neurotoxins, they are labor-intensive, costly, and unsuitable for large-scale screening. Existing computational approaches are often limited by shallow feature engineering and suboptimal multimodal fusion strategies, which restrict their predictive accuracy and generalizability in real-world applications. RESULTS: In this study, we propose BiToxNet, a deep learning framework that integrates evolutionary embeddings derived from a protein large language model with ten handcrafted biochemical descriptors through a bilinear attention network (BAN). This design enables effective modeling of cross-modal interactions and residue-level dependencies critical for neurotoxicity prediction. BiToxNet was evaluated on three datasets of different sequence lengths, namely Protein, Peptide, and Combined datasets, achieving accuracies of 92.3%, 96.0%, and 92.7%, respectively, and consistently outperforming existing state-of-the-art methods. Ablation studies confirmed the importance of both evolutionary embeddings and handcrafted features, as well as the critical role of BAN in feature fusion. Visualization analyses using t-SNE and hierarchical clustering further demonstrated that BiToxNet learns highly discriminative representations without reliance on domain-specific prior knowledge. Additional evaluation on an external imbalanced dataset validated the robustness and strong generalization capability of the proposed framework. CONCLUSIONS: Overall, BiToxNet provides a powerful and generalizable computational framework for the accurate identification of neurotoxic peptides and proteins. By effectively integrating evolutionary and biochemical information through bilinear attention, BiToxNet offers a valuable tool for neurotoxin screening and protein drug safety assessment, and presents a distinctive modeling strategy applicable to a wide range of biological sequence analysis tasks.