Multimodal learning audio-visual detection for obtaining object-level sound sources in Japanese-language teaching room

日语教学教室中用于获取物体级声源的多模态学习视听检测

阅读:1

Abstract

The combination of artificial intelligence and education is one of the current trends in research. While observing the daily teaching and learning process at school, we have considered the possibility of using multimodal learning, in particular audio-visual detection (AVD), to improve the teaching and learning process in Japanese-language teaching rooms. AVD can be effectively used to locate sounding objects (e.g. clapping, sneaking, organizing things, etc.) from unknown sources in online or physical classrooms. This study proposes a novel deep learning-based approach for audio-visual detection (AVD) in Japanese-language teaching rooms, combining audio and visual information to detect sound sources at the object level. To evaluate the proposed method, we construct an AVD benchmark that provides object-level annotations according to the sound sources in the videos. The feasibility of applying our proposed method in the classroom is demonstrated by designing evaluation metrics for AVD and comparing it with similar works.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。