Abstract
Image style transfer is a key research area in computer vision. Despite significant progress, challenges such as mode collapse, over-stylization, and insufficient style transfer persist, impacting image quality and stability. To address these issues, we introduce StyDiff, a novel framework that combines diffusion models and Adaptive Instance Normalization (AdaIN) to achieve high-quality and flexible style transfer. Specifically, StyDiff uses the AdaIN module to precisely blend content and style features, mitigating problems of over-stylization and incomplete style transfer. The diffusion model optimizes image generation through a stepwise denoising process, ensuring consistency between content and style while significantly reducing artifacts. Additionally, a multi-component loss function is designed to further enhance the balance between content and style. Experimental results demonstrate that StyDiff outperforms existing methods across key metrics such as SSIM, GM, and LPIPS, producing images with superior style consistency, content retention, and detail preservation. This approach offers a more stable and efficient solution for style transfer tasks, with promising potential for widespread application.