Machine Learning-Based QSAR Screening of Colombian Medicinal Flora for Potential Antiviral Compounds Against Dengue Virus: An In Silico Drug Discovery Approach

基于机器学习的哥伦比亚药用植物抗登革热病毒化合物定量构效关系筛选:一种计算机辅助药物发现方法

阅读:1

Abstract

Background/Objectives: Colombia harbors exceptional plant diversity, comprising over 31,000 formally identified species, of which approximately 6000 are classified as useful plants. Among these, 2567 species possess documented food and medicinal applications, with several traditionally utilized for managing febrile illnesses. Despite the global burden of dengue virus infection affecting millions annually, no specific antiviral therapy has been established. This study aimed to identify potential anti-dengue compounds from Colombian medicinal flora through machine learning-based quantitative structure-activity relationship (QSAR) modeling. Methods: An optimized XGBoost algorithm was developed through Bayesian hyperparameter optimization (Optuna, 50 trials) and trained on 2034 ChEMBL-derived activity records with experimentally validated anti-dengue activity (IC(50)/EC(50)). The model incorporated 887 molecular features comprising 43 physicochemical descriptors and 844 ECFP4 fingerprint bits selected via variance-based filtering. IC(50) and EC(50) endpoints were modeled independently based on their pharmacological distinction and negligible correlation (r = -0.04, p = 0.77). Through a systematic literature review, 2567 Colombian plant species from the Humboldt Institute's official checklist were evaluated (2501 after removing duplicates and infraspecific taxa), identifying 358 with documented antiviral properties. Phytochemical analysis of 184 characterized species yielded 3267 unique compounds for virtual screening. A dual-endpoint classification strategy categorized compounds into nine activity classes based on combined potency thresholds (Low: pActivity ≤ 5.0, Medium: 5.0 < pActivity ≤ 6.0, High: pActivity > 6.0). Results: The optimized model achieved robust performance (Matthews correlation coefficient: 0.583; ROC-AUC: 0.896), validated through hold-out testing (MCC: 0.576) and Y-randomization (p < 0.01). Virtual screening identified 276 compounds (8.4%) with high predicted potency for both endpoints ("High-High"). Structural novelty analysis revealed that all 276 compounds exhibited Tanimoto similarity < 0.5 to the training set (median: 0.214), representing 145 unique Murcko scaffolds of which 144 (99.3%) were absent from the training data. Application of drug-likeness filtering (QED ≥ 0.5) and applicability domain assessment identified 15 priority candidates. In silico ADMET profiling revealed favorable pharmaceutical properties, with Incartine (pIC(50): 6.84, pEC(50): 6.13, QED: 0.83), Bilobalide (pIC(50): 6.78, pEC(50): 6.07, QED: 0.56), and Indican (pIC(50): 6.73, pEC(50): 6.11, QED: 0.51) exhibiting the highest predicted potencies. Conclusions: This systematic computational screening of Colombian medicinal flora demonstrates the untapped potential of regional biodiversity for anti-dengue drug discovery. The identified candidates, representing structurally novel chemotypes, are prioritized for experimental validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。