Abstract
The potential diversity in the global repertoire of human antibody sequences is currently not well understood due to the limited existing paired antibody heavy-light chain sequence data that has been hindered by the low throughput and high costs of current single-cell sequencing methods. Here, we report IgHuAb, a large language model for high-throughput generation of paired human antibody sequences. Using IgHuAb, we created SynAbLib, a synthetic human antibody library that mimics population-level features of naturally occurring human antibody sequences, yet is associated with significantly greater diversity in sequence space. Further, experimental validation of a diverse set of antibodies from SynAbLib showed robust expression yields. IgHuAb and SynAbLib provide a readily expandable platform for human monoclonal antibody generation that can be efficiently mined for antibody sequences with target properties.