Abstract
INTRODUCTION: Animal studies have historically informed toxicological testing and safety assessments. However, assessment of the variability in both quantitative and qualitative results has been limited. Biological variability, experimental differences, interpretation of categorical endpoints, and data availability and curation approaches all contribute to the quantified variability. METHODS: A literature review was conducted to identify publications describing variability analyses for in vivo toxicology studies. Variability analyses were evaluated and summarized for a variety of toxicological endpoints: ocular irritation, dermal sensitization and irritation, acute oral and inhalation lethality, subchronic and chronic toxicity, carcinogenicity, neurotoxicity including DNT, endocrine, and genotoxicity. RESULTS: This review summarizes published investigations of variability within mammalian toxicological studies that have been largely conducted in accordance with health effects test guidelines. The results of this review suggest that replicability of in vivo toxicological guideline studies varies widely by study type, endpoint complexity, and classification approach. DISCUSSION: While any test system will have inherent variability, understanding its sources and impact on study interpretation will help ensure that appropriate confidence is applied when using the test method. Furthermore, such information aids in establishing relevant metrics to serve as baselines for informing performance characterization of new approach methodologies (NAMs). Future evaluation of NAMs should be contextualized using estimates of uncertainty and variance of the traditional study data to demonstrate "better" performance compared to traditional testing approaches. Robust understanding of guideline study performance is important for risk assessments, where it is important to find species-relevant NAMs that can perform at least as well as existing bioassays.