Abstract
The text-to-video generation task can provide people with rich and diverse video content, but it also has some typical issues, such as content inconsistency between video frames or text alignment failure, which degrade the smoothness of video. And in the process of improving the video smoothing problems, the background texture and artistic expression are often lost because of the excessive smoothing. Based on the above problems, this paper proposes INR Smooth, a type of video smoothing strategy based on the relationship between interframe noise, which can improve the smoothness of most T2V generation tasks. Based on INR Smooth, two video smoothing editing methods are proposed in this paper. One is for T2V training models, based on the studied interframe noise relationship, noise constraints are carried out from the beginning and end of the video simultaneously, and video smoothing loss functions are constructed. The other is for T2V training-free models, this paper introduces DDIM Inversion additionally to ensure text alignment, so as to improve the smoothness. Through experimental comparison, it is found that the proposed methods can significantly improve text alignment, temporal consistency, and has outstanding performance in the smooth transition of real scenes and the portrayal of artistic styles. The proposed training-free method and zero-shot fine-tuning training method for video smoothing do not add additional computing resources. The source codes and video demos are available at https://github.com/Cuihong-Yu/INR-Smooth.