Abstract
Understanding the biochemical mechanisms that drive protein-protein interactions is a challenging task, traditionally requiring mutation studies and expert interpretation of protein structures. A method that can generate mechanistic explanations from the biochemical properties and contributions of interactions would enhance our ability to study protein-protein interactions. In this study, we present a novel approach to interpreting mechanistic insights from machine learning methods; we manually annotated a dataset of 1225 mutation experiments with mechanistic insights focused on electrostatic, hydrogen bonding, steric and hydrophobic interactions. To show a preliminary process for evaluating mechanism prediction models, we extracted SHAP features that are representative of protein binding mechanisms from a Gradient-Boosting Tree (GBT) model trained to predict binding affinity. We found that the SHAP values generally agreed with the annotated mechanisms from our dataset, especially when looking at electrostatic and steric features. We also found that hydrophobicity consistently played a dominant role and hydrogen bonds consistently played a secondary role, challenging conventional assumptions about the role of these interactions.