GANimate: Ultra-Efficient Lip-Landmark-Driven Talking Face Animation Using a Learned Kalman Filter on GAN Feature Latent Space for Human-Computer Interaction on Mobile Devices

GANimate:基于GAN特征潜在空间学习卡尔曼滤波器的超高效唇部特征驱动说话人脸动画,用于移动设备上的人机交互

阅读:1

Abstract

We present GANimate, a lightweight method for animating talking faces that leverages recent advances in latent-space manipulation of Generative Adversarial Networks (GANs). Unlike existing approaches based on computationally intensive diffusion models, transformers, or complex 3DMM representations, which are impractical for mobile and other low-resource edge devices due to high memory and compute demands, GANimate is designed for efficient operation on low-memory, low-compute edge devices. The model operates on 2D lip landmarks extracted from standard mobile vision-sensor inputs and requires no pre-training, making it easily integrable with any lip-landmark generator. Through an optimization process in the GAN feature latent space, these landmarks act as geometric constraints to animate a static portrait, producing realistic and expressive lip movements. To maintain stability and visual coherence across frames, we employ a Kalman filter to detect and track lip landmarks during video synthesis, enabling adaptive refinement and improved temporal consistency. The result is a compact and modular framework that bridges the gap between performance and accessibility in talking face synthesis, delivering high-quality and stable animations with minimal computational overhead. GANimate represents an important step toward lifelike, real-time avatars suitable for sensor-enabled and mobile human-computer interaction.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。