Abstract
The digital preservation of Chinese calligraphy, a significant intangible cultural heritage, is impeded by the limitations of traditional generative methods in achieving precise style control and structural fidelity. This paper addresses these shortcomings, which are particularly prevalent in models like CalliGAN, by proposing a novel Transformer-enhanced Generative Adversarial Network (GAN) for calligraphic characters style transfer and generation. Our approach, named Calliformer, tackles three key challenges: (1) We replace RNN-based component encoders, which struggle with capturing global character structures, with a Structure-Aware Transformer encoder. This module models the spatial relationships between components more effectively via a dynamic structural attention bias mechanism. (2) To overcome the limited stylistic diversity of one-hot encodings, we introduce a simple yet effective style encoder that leverages a pre-trained convolutional neural network to extract rich, continuous style embeddings directly from reference images. (3) We introduce CCTS-2025, a publicly available calligraphy dataset annotated with explicit Chinese character structural relationships, to facilitate research in this domain. Experimental results demonstrate that our method achieves a 33.8% reduction in Mean Squared Error (MSE) (from 19.49 to 12.91) and an increase of 0.0965 in Structural Similarity Index (SSIM) compared to the state-of-the-art baseline. Furthermore, a human evaluation study reveals that 91.2% of the generated calligraphy is recognized as authentic and stylistically superior by human participants. This research offers a new paradigm for cultural heritage digitization, with significant application potential in calligraphy education and artifact restoration. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-025-29262-1.