Abstract
The complexity of chemical mixtures in the environment challenges their in-depth risk assessment due to the diverse compounds in use and the lack of experimental toxicity data. In silico models can be used to fill data gaps for compounds with unknown toxic potency. QSAR models typically distinguish only between active and inactive compounds, providing no information about the levels of activity. In this study, a quantitative structure-activity relationship (QSAR) model that classifies compounds into multiple activity levels was developed to address data gaps in the levels of aryl hydrocarbon receptor-mediated (AhR) activity of compounds commonly detected in environmental samples. Its practical applicability has been demonstrated on highly complex mixtures of aquatic pollutants from the Joined Danube Survey to prioritize the most relevant compounds for experimental assessment. The model's performance showed high sensitivity and specificity, with weighted overall accuracy ranging from 77 to 87%. The combination of experimental and QSAR predicted data was used to calculate site-specific AhR activity, which was compared to the overall AhR activity detected by in vitro bioassays. Experimental testing confirmed the ability of the QSAR model to identify compounds with high AhR activity, including benzonaphthothiophene, perylene, acridone, and triphenylene, and prioritize the most relevant suspected effect drivers. Our model can predict toxic potency and thus prioritize the potential bioactive compounds based on specific activity levels. Our study shows that when QSAR models are used for compound prioritization, several factors must be considered: cytotoxicity, solubility, the high rate of false positives for low-toxicity compounds, and the model's applicability domain.