Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation

对发育性运动性言语障碍患者词语和句子层面言语清晰度的自动评估:一项跨语言研究

阅读:1

Abstract

Background/Objectives: Accurate assessment of speech intelligibility is necessary for individuals with motor speech disorders. Transcription or scaled rating methods by naïve listeners are the most reliable tasks for these purposes; however, they are often resource-intensive and time-consuming within clinical contexts. Automatic speech recognition (ASR) systems, which transcribe speech into text, have been increasingly utilized for assessing speech intelligibility. This study investigates the feasibility of using an open-source ASR system to assess speech intelligibility in Hebrew and English speakers with Down syndrome (DS). Methods: Recordings from 65 Hebrew- and English-speaking participants were included: 33 speakers with DS and 32 typically developing (TD) peers. Speech samples (words, sentences) were transcribed using Whisper (OpenAI) and by naïve listeners. The proportion of agreement between ASR transcriptions and those of naïve listeners was compared across speaker groups (TD, DS) and languages (Hebrew, English) for word-level data. Further comparisons for Hebrew speakers were conducted across speaker groups and stimuli (words, sentences). Results: The strength of the correlation between listener and ASR transcription scores varied across languages, and was higher for English (r = 0.98) than for Hebrew (r = 0.81) for speakers with DS. A higher proportion of listener-ASR agreement was demonstrated for TD speakers, as compared to those with DS (0.94 vs. 0.74, respectively), and for English, in comparison to Hebrew speakers (0.91 for English DS speakers vs. 0.74 for Hebrew DS speakers). Listener-ASR agreement for single words was consistently higher than for sentences among Hebrew speakers. Speakers' intelligibility influenced word-level agreement among Hebrew- but not English-speaking participants with DS. Conclusions: ASR performance for English closely approximated that of naïve listeners, suggesting potential near-future clinical applicability within single-word intelligibility assessment. In contrast, a lower proportion of agreement between human listeners and ASR for Hebrew speech indicates that broader clinical implementation may require further training of ASR models in this language.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。