Abstract
The vaginal microbiota plays a pivotal role in maintaining female reproductive health and homeostasis. However, most currently available Lactobacillus isolates have been developed for the prevention and treatment of intestinal diseases, whereas strain-specific investigations targeting female reproductive health remain limited. Conventional cultivation and screening strategies are often associated with high costs and substantial labor requirements. Machine learning enables rapid analysis of large-scale omics data, facilitating efficient identification of candidate probiotics. A total of 639 Lactobacillus crispatus isolates were recovered from vaginal secretions of healthy women using three selective media. Based on phylogenetic tree analysis, 67 representative strains were selected for in vitro assays including growth capacity, acidification capacity (pH reduction), lactic acid production, hydrogen peroxide production, and antimicrobial activity. A weighted scoring system was subsequently established based on these results, leading to the identification of three Lactobacillus crispatus strains with optimal overall performance. Finally, by integrating k-mer-based genomic features of the strains, a multi-stage feature selection strategy was employed in combination with multiple machine learning algorithms to develop a predictive pipeline, VLCPredictor. This pipeline is capable of scoring the functional potential of vaginal-derived Lactobacillus crispatus strains. Our study provides an efficient framework for the development and application of Lactobacillus crispatus in female reproductive tract health. It may support significant implications for maintaining microbial equilibrium in the female reproductive system, as well as for the prevention and treatment of gynecological infectious diseases.