Abstract
MOTIVATION: Ancient DNA studies rely heavily on the EIGENSTRAT genotype format (.geno, .ind, .snp) for standard population genetic analyses including PCA, f-statistics, and qpWave/qpAdm. However, there is limited software available for processing EIGENSTRAT format data. pygenstrat , a Python package, is presented here, providing a command-line interface for comprehensive EIGENSTRAT data processing with extensive filtering, subsetting, and conversion options. pygenstrat implements memory-efficient, chunked processing algorithms for handling large ancient DNA datasets with low memory usage. It supports comprehensive operations, including updating individual and SNP files, subsetting datasets by selecting individuals or SNPs, filtering by minor allele frequency and missingness, pseudo-haploidisation, allele polarization, as well as conversion between EIGENSTRAT (text) and ANCESTRYMAP (binary) formats. Its modular architecture and Python implementation enable rapid integration with custom pipelines and future extensions. RESULTS: Benchmarking on the Allen Ancient DNA Resource (v 62.0) shows 2×-15× speedups and 90%-95% memory reduction compared to convertf, while producing equivalent outputs for standard operations. These improvements reduce turnaround time in ancient DNA workflows and facilitate reproducible processing. AVAILABILITY AND IMPLEMENTATION: pygenstrat is open-source, available at https://github.com/dkoptekin/pygenstrat.