Abstract
Computational models in systems biology are often underdetermined-that is, there is little data relative to the complexity and size of the model. The lack of data is primarily due to limits in our ability to observe specific biological systems and restricts the utility of computational models. However, there are a growing number of experimental databases in biology. While these databases provide more observations, they often do not have observations that match the system of interest exactly. For example, database measurements might be collected at different experimental conditions or on a different scale compared to the system of interest. Here, we investigate what information can be gleaned from generalizing databases across these differences in the context of modeling a specific system - cell signaling. Ultimately, our goal is to better determine models of specific systems, thereby increasing their utility. To do this, we propose a novel, multiscale, probabilistic framework. We use this framework to integrate measurements of protein structure from the Protein Data Bank and measurements of amino acid sequence from the Universal Protein Resource into the parameter inference of cell signaling models. Then, we quantify exactly what information is gained from these measurements when modeling cell signaling. We choose to investigate the utility of these databases in the context of dynamic cell signaling models because experimental measurements of the variables of interest, protein dynamics, are still quite limited. We find that we can successfully integrate measurements from these databases to significantly improve parameter estimation of signaling models. The impact of sequence and structure measurements on model predictions depends on the sensitivity of the prediction to perturbations in the parameter values. Overall, this study demonstrates that measurements of protein structure and amino acid sequence can be leveraged to better inform parameters in models of cell signaling.