esFont: A guided diffusion and multimodal distillation to enhance the efficiency and stability in font design

esFont:一种引导式扩散和多模态提炼方法,旨在提高字体设计的效率和稳定性

阅读:1

Abstract

Font design is an area that presents a unique opportunity to blend artistic creativity and artificial intelligence. However, traditional methods are time-consuming, especially for complex fonts or large character sets. Font transfer streamlines this process by learning font transitions to generate multiple styles from a target font. Yet, existing Generative Adversarial Network (GAN) based approaches often suffer from instability. Current diffusion-based font generation methods typically depend on single-modal inputs, either visual or textual, limiting their capacity to capture detailed structural and semantic font features. Additionally, current diffusion models suffer from high computational complexity due to their deep and redundant architectures. To address these challenges, we propose esFont, a novel guided Diffusion framework. It incorporates a Contrastive Language-Image Pre-training (CLIP) based text encoder, and a Vision Transformer (ViT) based image encoder, enriching the font transfer process through multimodal guidance from text and images. Our model further integrates deep clipping and timestep optimization techniques, significantly reducing parameter complexity while maintaining superior performance. Experimental results demonstrate that esFont improves both efficiency and quality. Our model shows clear enhancements in structural accuracy (SSIM improved to 0.91), pixel-level fidelity (RMSE reduced to 2.68), perceptual quality aligned with human vision (LPIPS reduced to 0.07), and stylistic realism (FID decreased to 13.87). It reduces the model size to 100M parameters, cuts training time to just 1.3 hours, and lowers inference time to only 21 minutes. In summary, esFont achieves significant advancements in both scientific and engineering domains by the innovative combination of multimodal encoding, structural depth pruning, and timestep optimization.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。