Acoustic estimation of voice roughness

语音粗糙度的声学估计

阅读:1

Abstract

Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。