Abstract
Peptides are natural information-bearing mediums and are promising for high-density data storage. However, conventional mapping of one amino acid (AA) to one binary code has limited the improvement of coding density by increasing the total number of different AAs. Here, a novel composite mapping strategy is developed, where each position in the peptide sequence is a composite letter consisting of several different AAs, and thousands of composite letters are available for mapping, thus breaking the limit of conventional mapping. When 20 different AAs are used, the coding density of six-AAs composite mapping achieves 15 bits/letter, while conventional mapping only reaches 4 bits/AA. The whole process of encoding data into composite letter sequences, synthesizing composite letter sequences via solid-phase peptide synthesis, sequencing composite letter sequences by mass spectrometry, and decoding data from composite letter sequences is successfully demonstrated for the first time. Composite mapping also demonstrates several distinct advantages, including high coding density, few synthesis cycles, high reliability against errors, low probability of homopolymers, and good compatibility with other encoding algorithms. The developed composite mapping strategy provides a novel way for peptide-based data storage to increase the coding density and reduce the synthesis cycles, showing great potential for large-scale data storage.