Abstract
BACKGROUND: Although Sanger sequencing remains widely used in human genetic disease diagnosis and livestock breeding, software packages for analyzing such data have seen little innovation over time. Determining the genotypes of tens to hundreds of loci across hundreds or thousands of samples still typically relies on manual visual confirmation with traditional software, a process that is both time-consuming and prone to error. RESULTS: We present SAGPEK, a tool that automatically identifies genotypes at target loci from hundreds to thousands of ABI-format Sanger sequencing files and directly outputs the results. SAGPEK extracts the signal intensities for A, G, C, and T bases, performs base calling, and determines each site's homozygous or heterozygous status. It then generates a primary sequence composed of the bases with the highest signal intensities and records secondary bases for heterozygous sites. Using either built-in or user-provided anchor sequences, SAGPEK maps the coordinates of target loci, reports their genotypes, and, when applicable, annotates the corresponding amino acid changes. CONCLUSIONS: SAGPEK provides an efficient, flexible, and user-friendly solution for analyzing ABI-format Sanger sequencing data, enabling simultaneous genotyping of tens of loci across hundreds of samples. Its innovation lies not in introducing new base-calling methods, but in integrating versatile functionalities-batch genotyping, customizable anchor sequences, amino acid alteration reporting, chromatogram visualization, and local execution-into a single open-source package. This makes SAGPEK well suited for applications such as human genetic disease screening, drug-resistance mutation detection, and functional mutation identification in livestock and other organisms.