Abstract
Stacking ensemble learning is a method to improve model generalization and robustness. Deep neural networks have demonstrated significant potential for predicting chemical properties due to their effectiveness in learning complex patterns within the chemical space. Nevertheless, an individual model may rely on a single molecular feature set that might not explicitly explain all of the relationships between drugs and targets. Integrating a stacking ensemble with deep learning (DL) and various molecular features could potentially enhance the learning process and improve the ability to capture complex relationships between molecular structures and bioactivities. Chemicals binding to thyroid peroxidase (TPO) are associated with thyroid dysfunction, highlighting the importance of assessing their potential risks to human health and the environment. In this study, we developed a novel stacking ensemble neural network model to predict TPO inhibitory activity. This model integrates convolutional neural networks, bidirectional long short-term memory networks, and attention mechanisms combined with top-performing molecular fingerprints to generate three probability features. These features were used as inputs in a meta-decision model, enhancing learning probability. The meta-model was validated through y-randomization, ensuring that the model does not produce outputs randomly. The applicability domain of this model was also assessed to affirm the reliability and trustworthiness of each prediction. The final attention-based meta-model achieved a recall of 0.55, specificity of 0.95, Matthews correlation coefficient of 0.56, area under the curve of 0.85, balanced accuracy of 0.75, and precision of 0.70. Furthermore, the developed model was generalized to other external test sets, effectively predicting TPO inhibition and identifying potentially toxic compounds from a selected Thai indigenous vegetable. These findings will contribute to the application of stacking ensemble neural networks in the toxicity screening of chemical compounds, enhancing their learning ability to capture more diverse chemical risk assessments.