Abstract
BACKGROUND: As the core functional carrier of life activities, the quality of protein representation directly affects the accuracy of downstream functional prediction. In recent years, multimodal deep learning methods have significantly improved the effectiveness of protein representation learning by virtue of their advantages in fusing sequence, structure, and chemical characteristics. However, current research still faces two core challenges: first, the guiding mechanism for structural information during multi-modal feature interaction has not been fully explored; second, existing fusion strategies mostly use static weight allocation mechanisms, which is difficult to adapt to sequence-structural features. The dynamic correlation between features leads to limited accuracy in identifying key functional residues. RESULTS: We proposed ProGraphTrans, a multimodal dynamic collaborative framework for protein representation learning. ProGraphTrans builds a dynamic attention multimodal fusion mechanism and captures local sequential patterns through a multi-scale convolutional neural network. CONCLUSIONS: Experimental results on four protein downstream tasks show that ProGraphTrans not only outperforms other methods in various indicators but also demonstrates excellent interpretability, demonstrating its advantages and effectiveness as a protein representation method.