Abstract
Proteins are the basic building blocks of life and perform fundamental functions in biology. Predicting protein properties based on amino acid sequences and 3D structures has become a key approach to accelerating drug development. In this study, we propose a novel sequence- and structure-based framework, SST-ResNet, which consists of the multimodal language model ProSST and a multi-scale information integration module. This framework is designed to deeply explore the latent relationships between protein sequences and structures, thereby achieving superior synergistic prediction performance. Our method outperforms previous joint prediction models on Enzyme Commission (EC) numbers and Gene Ontology (GO) tasks. Furthermore, we demonstrate the necessity of multi-scale information integration for these two types of data and illustrate its exceptional performance on key tasks. We anticipate that this framework can be extended to a broader range of protein property prediction problems, ultimately facilitating drug development.