Abstract
P-glycoprotein (P-gp), a key member of the ATP-binding cassette (ABC) transporter family, plays a significant role in drug absorption and distribution by binding to diverse xenobiotics and actively transporting them out of cells. Given P-gp's widespread expression, including its critical presence at the blood-brain barrier, identifying whether a compound functions as a P-gp substrate or inhibitor is essential in drug development to evaluate its ability to penetrate the central nervous system. However, most studies on P-gp focus on inhibitor models rather than substrate models. This study presents a robust graph neural network approach to predict P-gp substrates, leveraging graph convolutional networks, AttentiveFP, and an ensemble model. Using a dataset of 1995 drug molecules (1202 substrates, 793 nonsubstrates), AttentiveFP outperformed traditional methods, achieving an ROC-AUC of 0.848 and an accuracy of 0.815. Integrated gradient analysis identified 20 key substructures associated with P-gp substrates. Most noteworthy is that the top four conferring a >70% probability of substrate classification which can be used a quick assessment in the future. This interpretable framework enhances P-gp prediction and broader drug development efforts.