Abstract
BACKGROUND: Carotenoids are essential plant pigments with key roles in stress tolerance and human nutrition. β-carotene is a major provitamin A carotenoid, and understanding the genetic basis of its natural variation in Capsicum annuum is important for nutritional improvement. However, carotenoid accumulation is a complex quantitative trait influenced by multiple metabolic and regulatory pathways. METHODS: An extreme-phenotype genome-wide association study (XP-GWAS) was conducted using pooled genomic DNA from 92 C. annuum accessions representing contrasting extremes of β-carotene content. Fruit carotenoid levels from previously characterized accessions were used to establish high- and low-content groups, while genomic DNA from these same accessions was subjected to high-throughput paired-end sequencing. Variant calling yielded 19,066,129 raw variants, which were filtered to 1,025,269 high-confidence single nucleotide polymorphisms (SNPs) for association analysis. RESULTS: XP-GWAS identified 91 SNPs showing significant allele frequency differences between high- and low-β-carotene pools (FDR < 0.05), with 19 located on assembled chromosomes and 72 on unanchored scaffolds, limiting their immediate utility for functional validation and breeding applications. Among these, 4-hydroxyphenylpyruvate dioxygenase (HPPD) exhibited the most prominent clustered association signal, with multiple significant SNPs overlapping the HPPD gene on chromosome 5. Based on prior studies, HPPD is known to participate in plastoquinone biosynthesis, which indirectly supports carotenoid desaturation; however, the present study identifies a statistical association rather than functional validation in C. annuum. Additional SNPs were detected near genes involved in sulfur metabolism, ribosomal function, signaling, and non-coding RNAs, and are interpreted as exploratory, hypothesis-generating signals requiring further validation. CONCLUSIONS: This pooled XP-GWAS prioritizes HPPD and several additional genomic regions as candidate loci associated with β-carotene variation in C. annuum. Given the exploratory design, pooled sequencing strategy, and prevalence of unanchored scaffold signals, these associations should be viewed as hypothesis-generating and require independent validation before functional or breeding applications.