The technique of generating facial animations based on emotions and speech significantly enhances various AI systems. This process begins with analyzing a speech signal to identify phoneme-emotion combinations, which are then translated into viseme-expression pairs for video animation. This study introduces a novel method for creating lifelike facial animations using emotional speech cues. We start by pinpointing specific acoustic features that accurately represent each phoneme-emotion pair. An active learning method is then applied to select key facial frames that effectively represent these pairs. During this selection phase, a deep learning model is designed to identify the most meaningful patches within each frame. Subsequently, these key frames are combined using the morphing technique, resulting in a fluid and visually appealing animation of facial expressions. The experiments demonstrate that this approach is capable of real-time performance on widely used mobile operating systems like iOS and Android, delivering animations that closely match the speech and emotional expressions. We further present the application of our technique to table tennis live streaming.
Generating human facial animation by aggregation deep network and low-rank active learning with table tennis applications.
阅读:4
作者:Li Yaolu, Tang Dongyang, Yang Yi
| 期刊: | Scientific Reports | 影响因子: | 3.900 |
| 时间: | 2025 | 起止号: | 2025 Aug 1; 15(1):28169 |
| doi: | 10.1038/s41598-025-13779-6 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
