Abstract
Polycomb Group (PcG) proteins, particularly E(z) (Enhancer of Zeste) genes, play essential roles in transcriptional repression and developmental regulation. To investigate their evolutionary history, we conducted a comprehensive comparative genomic analysis of E(z) homologs across green plants. Phylogenetic analysis revealed that E(z) genes are highly conserved, predominantly occurring as single copies in green algae and early land plants. In seed plants, however, E(z) homologs diverged into two major clades, CLF and SWN, likely originating from an ancient duplication predating seed plant diversification. Conserved domain and motif analyses showed that while all E(z) proteins contain the hallmark SET domain, certain lineages also harbor CXC and SANT domains. Moreover, lineage-specific motif divergence was observed, suggesting functional diversification. In angiosperms, further duplications shaped the SWN lineage: in Brassicaceae, SWN genes split into SWN and MEA subclades, whereas in Fabaceae, SWN genes diverged into SWN1 and SWN2. Structural comparisons revealed that both Brassicaceae MEA and Fabaceae SWN2 proteins independently lost approximately 200 amino acids in the central region, indicating convergent structural modifications. Molecular evolutionary analysis showed that Fabaceae SWN1 genes are under purifying selection, consistent with retention of ancestral functions, whereas SWN2 genes experienced strong positive selection, implying functional innovation. Expression profiling of soybean E(z) genes further supported this scenario: SWN1 is broadly expressed across tissues, while SWN2 expression is restricted to the heart-shaped embryo. This pattern mirrors Arabidopsis MEA, suggesting that Fabaceae SWN2 may have evolved imprinted gene functions critical for seed development. Together, our results highlight the evolutionary conservation of E(z) genes in plants and reveal how gene duplication and lineage-specific divergence have driven functional specialization, particularly in Fabaceae SWN2.