Binding Activity Classification of Anti-SARS-CoV-2 Molecules using Deep Learning Across Multiple Assays

利用深度学习对多种检测方法中的抗SARS-CoV-2分子结合活性进行分类

阅读:1

Abstract

BACKGROUND: The coronavirus disease-2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), has urgently necessitated effective therapeutic solutions, with a focus on rapidly identifying and classifying potential small-molecule drugs. Given traditional methods’ labor-intensive and time-consuming nature, deep learning has emerged as an essential tool for efficiently processing and extracting insights from complex biological data. AIMS: To utilize deep learning techniques, particularly deep neural networks (DNN) enhanced with the synthetic minority oversampling technique (SMOTE), to enhance the classification of binding activities in anti-SARS-CoV-2 molecules across various bioassays. METHODS: We used 11 bioassay datasets covering various SARS-CoV-2 interactions and inhibitory mechanisms. These assays ranged from spike-ACE2 protein-protein interaction to ACE2 enzymatic activity and 3CL enzymatic activity. To address the prevalent class imbalance in these datasets, the SMOTE technique was employed to generate new samples for the minority class. In our model-building approach, we divided the dataset into 80% training and 20% test sets, reserving 10% of the training set for validation. Our approach involved employing a DNN that integrates ReLU and sigmoid activation functions, incorporates batch normalization, and uses Adam optimization. The hyperparameters and architecture of the DNN were optimized through various tests on layers, minibatch sizes, epoch sizes, and learning rates. A 40% dropout rate was incorporated to mitigate overfitting. For model evaluation, we computed performance metrics, such as balanced accuracy (BACC), precision, recall, F1 score, Matthews’ correlation coefficient (MCC), and area under the curve (AUC). RESULTS: The performance of the DNN across 11 bioassay test sets revealed varying outcomes, significantly influenced by the ratios of active-to-inactive compounds. Assays, such as AlphaLISA and CoV-PPE, demonstrated robust performance across various metrics, including BACC, precision, recall, and AUC, when configured with more balanced ratios (1:3 and 1:1, respectively). This suggests the effective identification of active compounds in both cases. In contrast, assays with higher imbalance ratios, such as 3CL (1:38) and cytopathic effect (1:15), demonstrated higher recall but lower precision, highlighting challenges in accurately identifying active compounds among numerous inactive compounds. However, even in these challenging settings, the model achieved favorable BACC and recall scores. Overall, the DNN model generally performed well, as indicated by the BACC, MCC, and AUC values, especially when considering the degree of dataset imbalance in each assay. CONCLUSION: This study demonstrates the significant impact of deep learning, particularly DNN models enhanced with SMOTE, in improving the identification of active compounds in bioassay datasets for COVID-19 drug discovery, outperforming traditional machine learning models. Furthermore, this study highlights the efficacy of advanced computational techniques in addressing high-throughput screening data imbalances.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。