Wordsworth: A generative word dataset for comparison of speech representations in humans and neural networks

Wordsworth:一个用于比较人类和神经网络语音表征的生成词数据集

阅读:1

Abstract

Speech perception is fundamental for human communication, but its neural basis is not well understood. Furthermore, while modern neural networks (NNs) can accurately recognize speech, whether they effectively model human speech processing remains unclear. Here, we introduce Wordsworth, a dataset designed to facilitate comparisons of speech representations between artificial and biological NNs. We synthesised 1,200 tokens for each of 84 monosyllabic words while controlling for acoustic parameters such as amplitude, duration, and background noise, thus encouraging the use of phonetic features known to be important for speech perception. Human listening experiments showed that Wordsworth tokens are intelligible. Additional experiments using convolutional NNs showed (i) that Wordsworth tokens were recognizable and (ii) that error patterns could be at least partially explained by acoustic phonetics. The control with which tokens were created permits end users to manipulate them in whatever ways might be useful for their purposes. Finally, a subset of tokens specifically for human neuroscience experiments was also created, with precise and known distributions of amplitude, onset, and offset times.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。