Abstract
Shiga toxin-producing Escherichia coli (STEC) are genetically diverse foodborne pathogens of major global public health concerns. Serogroup-level identification is critical for effective surveillance and outbreak control; however, it is often challenged by STEC's genome plasticity and frequent recombination. In this study, we employed a standardized pangenomic pipeline integrating Roary ILP Bacterial Core Annotation Pipeline (RIBAP) and Panaroo to analyze 160 complete, high-quality STEC genomes representing eight major serogroups at a 95% sequence identity threshold. Candidate serogroup-specific markers were identified using gene presence/absence profiles from RIBAP and Panaroo. Our analysis revealed several high-confidence markers, including metabolic genes (dgcE, fcl_2, dmsA, hisC) and surface polysaccharide-related genes (capD, rfbX, wzzB). Comparative pangenomic evaluation showed that RIBAP predicted a larger pangenome size than Panaroo. Additionally, some genomes from the O104:H1, O145:H28, and O45:H2 serotypes clustered outside their expected clades, indicating sporadic serotype misplacements in phylogenetic reconstructions. Functional annotation suggested that most candidate markers are involved in critical processes such as glucose metabolism, lipopolysaccharide biosynthesis, and cell surface assembly. Notably, approximately 22.9% of the identified proteins were annotated as hypothetical. Overall, this study highlights the utility of pangenomic analysis for potential identification of clinically relevant STEC serogroups markers and phylogenetic interpretation. We also note that pangenome analysis could guide the development of more accurate diagnostic and surveillance tools.