Abstract
Diverse machine learning methods promise to forecast gene expression changes in response to novel genetic perturbations. However, these methods' accuracy is not well characterized. We created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to assess methods, parameters, and sources of auxiliary data, finding that it is uncommon for expression forecasting methods to outperform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.