Abstract
Influenza virus, with high morbidity and mortality rates, is a global health threat. Traditional antiviral screenings are costly, whereas machine learning could enhance the effectiveness of antiviral drug discovery. Leveraging a large-scale, in-house antiviral dataset against H1N1, we developed a small molecule compound seeker (SMCseeker) framework for identifying highly active anti-H1N1 agents. Data augmentation and a multi-head attention mechanism were utilized to address the extreme data imbalance and enhance the generalization ability of the model. 18,093 structure-activity signatures after cleaning from 52,800 compounds were selected for training, with another 3,876 validation and 3,879 unseen data points to verify the model's generalization ability. H1N1-SMCseeker demonstrates stable performance on validation dataset, unseen dataset, and one experiment, with Positive Predictive Values (PPV) of 70.59%, 70.59%, and 70.65%, respectively. Therefore, H1N1-SMCseeker can effectively identify anti-H1N1 compounds. The SMCseeker framework could potentially be repurposed for discovering antivirals against other medically important viruses.
