Abstract
BACKGROUND: Protein-protein interactions regulate the dynamic operation of intracellular molecular networks, serving as the molecular basis for revealing protein functions and disease mechanisms. Recently, several computational methods for predicting protein-protein interaction sites (PPIs) have been presented as alternatives to costly and labor-intensive traditional experiments. However, existing methods generally ignore the inherent hierarchical structure of protein chains. Furthermore, the equivariance of graph structure during spatial transformations is often neglected when applying graph neural networks to modeling. Therefore, accurately identifying PPIs remains a challenging task. RESULTS: In this work, we propose an end-to-end GNN-based computational method, EGCPPIS, for efficiently identifying protein-protein interaction sites. First, we construct a hierarchical graph representation of the protein chain, including residue-level graph and atom-level graph. Next, EGCPPIS designs an E(n) Equivariant Graph Neural Network (EGNN) module to learn residue-level embeddings with equivariant features. After further extracting atom-level embeddings using the GraphSAGE module, we introduce the contrastive learning strategy to integrate hierarchical graph features. This strategy enables us to learn consistent embeddings between residue-level and atom-level representations. Finally, the fused embeddings are weighted using an improved gated multi-head attention mechanism. CONCLUSION: Comprehensive evaluation results on multiple datasets demonstrate that EGCPPIS significantly outperforms state-of-the-art methods. Extensive comparative experiments and case studies further confirm that EGCPPIS can reveal the decision-making patterns in PPIs prediction, facilitating the discovery of potential PPIs. The original datasets and code of EGCPPIS are available at https://github.com/GuicongSun/EGCPPIS .