Abstract
We present GANimate, a lightweight method for animating talking faces that leverages recent advances in latent-space manipulation of Generative Adversarial Networks (GANs). Unlike existing approaches based on computationally intensive diffusion models, transformers, or complex 3DMM representations, which are impractical for mobile and other low-resource edge devices due to high memory and compute demands, GANimate is designed for efficient operation on low-memory, low-compute edge devices. The model operates on 2D lip landmarks extracted from standard mobile vision-sensor inputs and requires no pre-training, making it easily integrable with any lip-landmark generator. Through an optimization process in the GAN feature latent space, these landmarks act as geometric constraints to animate a static portrait, producing realistic and expressive lip movements. To maintain stability and visual coherence across frames, we employ a Kalman filter to detect and track lip landmarks during video synthesis, enabling adaptive refinement and improved temporal consistency. The result is a compact and modular framework that bridges the gap between performance and accessibility in talking face synthesis, delivering high-quality and stable animations with minimal computational overhead. GANimate represents an important step toward lifelike, real-time avatars suitable for sensor-enabled and mobile human-computer interaction.