Abstract
Biological data is prone to both intrinsic and extrinsic noise and variability between experimental replicas. That same stochasticity and heterogeneity can carry information about underlying biochemical mechanisms but, if not incorporated in modeling and probabilistic inference, can also bias parameter estimates and misguide predictions and, subsequently, experiment design. Mechanistic inference typically requires lengthy simulations (e.g., the Stochastic Simulation Algorithm (SSA)); approximations to chemical master equation (CME) solutions that lack rigorous error tracking; or deterministic averaging that lacks the complexity necessary to reflect the data. We introduce the Stochastic System Identification Toolkit (SSIT) - a fast, flexible, and open-source software package available on GitHub that makes use of MATLAB's efficient and diverse computational architecture. The SSIT is designed for building, simulating, and solving chemical reaction models using ODEs, moments, SSA, Finite State Projection truncations of the CME, or hybrid methods; sensitivity analysis and Fisher information quantification; parameter fitting using likelihood- or Bayesian-based methods; handling of experimental noise and measurement errors using probabilistic distortion operators; and sequential experiment design that empowers users to save time and resources while gaining the most information possible out of their data. The SSIT also offers advanced modeling tools, including model reduction methods for increased efficiency and joint fitting of models and datasets with overlapping reactions/parameters. To facilitate the ease and speed of use, the SSIT provides a graphical user interface and ready-made, adaptable pipelines that can be run in the background from commandline or high-performance computing clusters. We demonstrate features of the SSIT on two experimental datasets: the first consists of published mRNA count data that reflect Saccharomyces cerevisiae yeast cell response to osmotic shock using single-cell single-molecule fluorescence in situ hybridization; the second consists of single-cell RNA sequencing measurements of 151 activating genes in breast cancer cells following treatment with dexamethasone.