Abstract
Recent advances in topology-based modeling have greatly improved molecular prediction tasks, particularly in protein-ligand binding affinity. However, when the focus shifts to predicting protein-protein interactions (PPIs) binding free energy, the question becomes significantly more challenging due to the ineffective use of topological features and the lack of reliable datasets. In this work, we propose a persistent-Laplacian machine learning framework centered on the Persistent-Laplacian Neural Network (PLNet), which encodes each protein chain at the binding interface using both persistent Laplacian-based features and protein language model embeddings. It can achieve a promising Pearson correlation of 0.80 under leave-out-protein-out cross-validation on our newly assembled benchmark dataset, P2P, which includes 6886 protein complexes drawn from existing sources. For comparison, we also implement a gradient-boosting decision tree model under the same settings. This baseline method highlights the advantage of PLNet in capturing complex topology-aware descriptors in PPI prediction.