Abstract
Fungi are closely associated with various diseases, among which Candida albicans (C. albicans) is recognized as an important pathogen in inflammatory bowel disease (IBD). Fungal pathogenicity is primarily mediated by virulence factors (VFs); therefore, comprehensive identification of fungal virulence factors is critical for targeted drug development and disease treatment. However, current databases contain limited numbers of fungal VFs, lack effective predictive algorithms, and do not directly provide protein structural information relevant for drug discovery. In this study, we constructed a positive dataset comprising 18,072 fungal VFs. Utilizing machine learning approaches, we further predicted and identified 390 potential VFs from 8,081 representative protein sequences across the proteomes of 99 C. albicans strains, generating a dedicated C. albicans VF dataset. Additionally, five IBD-associated pathogenic VFs were identified, and their protein structural data included in the dataset were leveraged to facilitate small-molecule compound screening. Collectively, this study provides a comprehensive data resource and theoretical foundation for the identification of fungal VFs and the development of related therapeutics.