Abstract
Outsourcing the storage and analysis of genomic data to third-party servers is often necessary due to the scale of modern datasets, but it introduces significant privacy challenges that must be addressed to ensure secure handling. K-mer-based analyses offer broad applications across genomics research, clinical diagnostics, pathogen surveillance, and metagenomic classification, though implementation requires careful ethical and technical considerations, particularly when processing human genomic data in clinical settings. We present a novel protocol utilizing homomorphic encryption that enables a client to store a fully encrypted version of a genome on an untrusted server and perform private k-mer searches. The protocol ensures the server never gains access to the client's non-encrypted genome sequence, nor does it learn the content of any k-mer query. After a one-time client-side encryption of the genome, the server performs all computations on ciphertext, returning only encrypted results that can be decrypted solely by the data owner. This framework transforms an honest but curious cloud server into a secure storage and computation system, enabling practical and confidential querying of encrypted, client-owned genomic data. The system supports exact k-mer searches on genomic data, as well as position weight matrix searches. Finally, we provide KmerCrypt, a private k-mer search toolkit that implements this protocol, offering researchers an efficient and secure solution for querying encrypted genomic datasets without compromising privacy.