Abstract
Virulence factors (VFs) are crucial molecules that enable pathogens to cause infection and disease in a host. They allow pathogens to evade the host's immune defenses and facilitate the progression of infection through various mechanisms. With the increasing prevalence of antibiotic-resistant strains and the emergence of new and re-emerging infectious agents, the classification of VFs has become more critical. This study presents PLM-GNN, an innovative dual-channel model designed for precise classification of VFs, focusing on the seven most numerous types. It integrates a structure channel, which employs a geometric graph neural network to capture the three-dimensional structure features of VFs, and a sequence channel that utilizes a pre-trained language model with Convolutional Neural Network (CNN) and Transformer architectures to extract local and global features from VF sequences, respectively. On the independent test set, the method achieved an accuracy of 86.47%, an F1 score of 86.20% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 97.20%, validating its effectiveness. In conclusion, PLM-GNN can precisely classify the seven major VFs, offering a novel approach for studying their functions.