YembaTones: A syllable-tone annotated dataset for speech recognition and prosodic analysis of the Yemba language

YembaTones:一个用于Yemba语语音识别和韵律分析的音节-声调标注数据集

阅读:1

Abstract

Prosody is a key area of linguistics that explores tonal and rhythmic variations in speech. In tonal languages such as Yemba, prosody plays a crucial role in distinguishing between words with different meanings or different grammatical forms. However, despite the large number of native speakers of this language in Cameroon, there are few resources for the speech recognition and synthesis. In this article, we present YembaTones, a syllabic and tonal annotated dataset, created from a dictionary we designed of 344 Yemba/French words coming from the most common phrases of the language, grouped according to their spellings that only differ by the tone. The dataset was originally designed for training and evaluating tone detection models for tonal and low resource languages. The recordings of the pronunciation of these words were made with 11 native speakers of Yemba, mainly specialists in linguistics with a good command of the sounds of the language. The recordings were made with a dictaphone in different places such as the homes of the speakers, the campuses and their workplaces. Then they have been cleaned and segmented into individual audio files corresponding to the pronunciations of isolated words, using the software Audacity. After cleaning and segmentation, we selected 3420 good quality audio files for annotation. Annotations were made at the syllabic and tonal level using Praat software. YembaTones is a valuable resource not only for the training and evaluation of automatic tone detection models but also for automatic speech recognition, speech synthesis of tonal and poorly endowed languages, as well as for the study of prosody and Yemba phonetics, research in speech acoustics and phonetic linguistics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。