Abstract
The cytochrome P450 (CYP) superfamily metabolises a wide range of compounds; however, drug-induced CYP inhibition can lead to adverse interactions. Identifying potential CYP inhibitors is crucial for safe drug administration. This study investigated the application of deep learning techniques to the prediction of CYP inhibition, focusing on the challenges posed by limited datasets for CYP2B6 and CYP2C8 isoforms. To tackle these limitations, we leveraged larger datasets for related CYP isoforms, compiling comprehensive data from public databases containing IC50 values for 12,369 compounds that target seven CYP isoforms. We constructed single-task, fine-tuning, multitask, and multitask models incorporating data imputation on the missing values. Notably, the multitask models with data imputation demonstrated significant improvement in CYP inhibition prediction over the single-task models. Using the most accurate prediction models, we evaluated the inhibitory activity of approved drugs against CYP2B6 and CYP2C8. Among the 1,808 approved drugs analysed, our multitask models with data imputation identified 161 and 154 potential inhibitors of CYP2B6 and CYP2C8, respectively. This study underscores the significant potential of multitask deep learning, particularly when utilising a graph convolutional network with data imputation, to enhance the accuracy of CYP inhibition predictions under the conditions of limited data availability.Scientific contributionThis study demonstrates that even with small datasets, accurate prediction models can be constructed by utilising related data effectively. Also, our imputation techniques on the missing values improved the prediction accuracy of CYP2B6 and CYP2C8 inhibition significantly.