The Korean Speech Recognition Sentences: A Large Corpus for Evaluating Semantic Context and Language Experience in Speech Perception

韩语语音识别句子:用于评估语音感知中的语义上下文和语言经验的大型语料库

阅读:1

Abstract

PURPOSE: The aim of this study was to develop and validate a large Korean sentence set with varying degrees of semantic predictability that can be used for testing speech recognition and lexical processing. METHOD: Sentences differing in the degree of final-word predictability (predictable, neutral, and anomalous) were created with words selected to be suitable for both native and nonnative speakers of Korean. Semantic predictability was evaluated through a series of cloze tests in which native (n = 56) and nonnative (n = 19) speakers of Korean participated. This study also used a computer language model to evaluate final-word predictabilities; this is a novel approach that the current study adopted to reduce human effort in validating a large number of sentences, which produced results comparable to those of the cloze tests. In a speech recognition task, the sentences were presented to native (n = 23) and nonnative (n = 21) speakers of Korean in speech-shaped noise at two levels of noise. RESULTS: The results of the speech-in-noise experiment demonstrated that the intelligibility of the sentences was similar to that of related English corpora. That is, intelligibility was significantly different depending on the semantic condition, and the sentences had the right degree of difficulty for assessing intelligibility differences depending on noise levels and language experience. CONCLUSIONS: This corpus (1,021 sentences in total) adds to the target languages available in speech research and will allow researchers to investigate a range of issues in speech perception in Korean. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24045582.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。