Abstract
MOTIVATION: The biosecurity issue arises as the capability of deep-learning-based protein design has rapidly increased in recent years. Current regulation procedures for DNA synthesizing focus on the biosecurity but ignore the data privacy. RESULTS: We propose a general framework for adding watermarks to protein sequences designed by various autoregressive deep-learning models. Compared to current regulation procedures, watermarks also ensure robust traceability to achieve biosecurity but maintain privacy of designed sequences by local verification. Benchmarked with other watermarking techniques, the watermark detection efficiency of our method is substantially increased to be more practical in real-world scenarios. Moreover, it provides a convenient way for researchers to claim their own intellectual property since the designer's information could be embedded into the sequence with our framework. AVAILABILITY AND IMPLEMENTATION: The implementation of the protein watermark framework is freely available to noncommercial users at https://github.com/poseidonchan/ProteinWatermark.