Speech recognition using an english multimodal corpus with integrated image and depth information

利用融合了图像和深度信息的英语多模态语料库进行语音识别

阅读:1

Abstract

Traditional English corpora mainly collect information from a single modality, but lack information from multimodal information, resulting in low quality of corpus information and certain problems with recognition accuracy. To solve the above problems, this paper proposes to introduce depth information into multimodal corpora, and studies the construction method of English multimodal corpora that integrates electronic images and depth information, as well as the speech recognition method of the corpus. The multimodal fusion strategy adopted integrates speech signals and image information, including key visual information such as the speaker's lip movements and facial expressions, and uses deep learning technology to mine acoustic and visual features. The acoustic model in the Kaldi toolkit is used for experimental research.Through experimental research, the following conclusions were drawn: Under 15-dimensional lip features, the accuracy of corpus A under monophone model was 2.4% higher than that of corpus B under monophone model when the SNR (signal-to-noise ratio) was 10dB, and the accuracy of corpus A under the triphone model at the signal-to-noise ratio of 10dB was 1.7% higher than that of corpus B under the triphone model at the signal-to-noise ratio of 10dB. Under the 32-dimensional lip features, the speech recognition effect of corpus A under the monophone model at the SNR of 10dB was 1.4% higher than that of corpus B under the monophone model at the SNR of 10dB, and the accuracy of corpus A under the triphone model at the SNR of 10dB was 2.6% higher than that of corpus B under the triphone model at the SNR of 10dB. The English multimodal corpus with image and depth information has a high accuracy, and the depth information helps to improve the accuracy of the corpus.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。