Abstract
The genomes of 40 Brucella strains were retrieved from the NCBI database to investigate Brucellosis at the genomic level, focusing on secondary metabolites, resistance genes, and virulence factors. Genome analysis software, secondary metabolite mining tools, and relevant gene databases were employed for detailed analysis. The genome sizes of these strains range from 4.88 to 6.00 MB, with G+C content between 53.5 and 60.5%. Phylogenetic analysis classified the strains into three distinct clades: Brucella anthropi CCUG 34461, Brucella sp. NBRC 13694, and Brucella anthropi MAG47. Pan-genome analysis revealed 21,800 gene families, 198 core genes, and 10,371 unique genes, indicating an open pan-genome. The secondary metabolite mining software identified 18 categories and 350 gene clusters, predicting a total of 298 secondary metabolites, primarily arylpolyene, acyl-amino acids, betalactone, terpene, hydrogen cyanide, and NAGGN. Genome sequences were uploaded in FASTA format to the CARD resistance gene database, identifying seven resistance genes: rpsE, rpsL, rosA, golS, fabG, fabI, and uL3. B. anthropi SBA01 and B. media Q1108 were found to harbor the highest number of drug resistance genes. Likewise, the sequences were compared to the VFDB virulence gene database, revealing eight virulence genes: lpxC, acpXL, fliY, bspJ, lpxA, fliI, fliQ, and bvrR. The B. cytisi IPA7.2 strain exhibited the highest number of virulence genes, with lpxC and acpXL potentially being unique to Brucella compared to other species. This study provides comprehensive genomic data, elucidating the relationship between the pan-genome, core genome, and genome size, while predicting the types of secondary metabolites, resistance, and virulence genes. These findings provide a basis for comprehensively understanding Brucella and lay a solid foundation for its prevention and treatment.