Abstract
High-throughput genome sequencing and genotyping have significantly accelerated genetic research. However, the high cost of whole-genome sequencing (WGS) remains a barrier to large-scale studies like genome-wide association studies (GWAS) and genomic prediction. Genotype imputation offers a cost-effective alternative by inferring unobserved variants from lower-density data using haplotype reference panels. In this study, we present the updated Pig Haplotype Reference Panel (PHARP) 4.0, comprising 6449 pig genomes from 154 breeds. PHARP 4.0 encompasses 50.3 million SNPs and 5.8 million indels, making it the largest and most diverse pig reference panel to date. PHARP 4.0 demonstrated superior imputation accuracy compared to existing panels (SWIM, AHC, AGIDB, and PGRP), achieving concordance rates (CR > 0.99) and correlation coefficients (R² > 0.98) in European breeds and improved accuracy in Chinese Jinhua pigs (CR = 0.936, R² = 0.924) when imputing from 80 K SNP chip data to whole-genome sequencing (WGS). We further optimized an RNA-seq-based imputation pipeline by incorporating multiple breeds and applying a 6× sequencing depth filter, achieving CR > 0.95 and R² > 0.90 in European breeds, and a CR of 0.93 with an R² = 0.92 in Chinese Jinhua pigs. Additionally, increasing the specific reference panel size to approximately 400 samples improved the imputation of rare variants. Utilizing PHARP 4.0, we successfully imputed low-density SNP chip data for two GWAS, identifying significant SNPs likely representing causal variants. Overall, PHARP 4.0 serves as a valuable resource for advancing pig genetic research and supporting breeding programs.