Abstract
Monotherapy cancer drug response prediction (DRP) models predict the response of a cell line to a given drug. Analyzing these models' performance includes assessing their ability to predict the response of cell lines to new drugs, i.e., drugs that are not in the training set. Drug-blind prediction displays greatly diminished performance or outright failure across a wide range of model architectures and different large pharmacogenomic datasets. Drug-blind failure is hypothesized to be caused by the relatively limited set of drugs present in these datasets. The time and cost associated with further cell line experiments is significant, and it is impossible to predict beforehand how much data would be enough to overcome drug-blind failure. We must first define how current data contributes to drug-blind failure before attempting to remedy drug-blind failure with further data collection. In this work, we quantify the extent to which drug-blind generalizability relies on mechanistic overlap of drugs between training and testing splits. We first identify that the majority of mixed set DRP model performance can be attributed to drug overfitting, likely inhibiting generalization and preventing accurate analysis. Then, by specifically probing the drug-blind ability of models, we reveal the sources of generalizable drug features are confined to shared mechanisms of action and related pathways. Furthermore, we observed that, for certain mechanisms, we can significantly improve performance by limiting the training of models to a single mechanism compared to training on all drugs simultaneously. Across multiple different model architectures examined in this paper, we observe that drug-blind performance is a poor benchmark for DRP as it does not describe model behavior, it describes dataset behavior. Our investigation displays that these deep learning models trained on large, monotherapy cell line panels can more accurately describe mechanism of action of drugs rather than their advertised connection to broader cancer biology.