NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram

NISQE:基于频谱图均值减去对比度归一化系数自然统计的非侵入式语音质量评估器

阅读:2

Abstract

With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations of technology may degrade the quality of the speech signal. Therefore, there is a requirement for continuous quality assessment of the speech signal. Speech quality assessment (SQA) enables the system to automatically tune network parameters to improve speech quality. Furthermore, there are many speech transmitters and receivers that are used for voice processing including mobile devices and high-performance computers that can benefit from SQA. SQA plays a significant role in the evaluation of speech-processing systems. Non-intrusive speech quality assessment (NI-SQA) is a challenging task due to the unavailability of pristine speech signals in real-world scenarios. The success of NI-SQA techniques highly relies on the features used to assess speech quality. Various NI-SQA methods are available that extract features from speech signals in different domains, but they do not take into account the natural structure of the speech signals for assessment of speech quality. This work proposes a method for NI-SQA based on the natural structure of the speech signals that are approximated using the natural spectrogram statistical (NSS) properties derived from the speech signal spectrogram. The pristine version of the speech signal follows a structured natural pattern that is disrupted when distortion is introduced in the speech signal. The deviation of NSS properties between the pristine and distorted speech signals is utilized to predict speech quality. The proposed methodology shows better performance in comparison to state-of-the-art NI-SQA methods on the Centre for Speech Technology Voice Cloning Toolkit corpus (VCTK-Corpus) with a Spearman's rank-ordered correlation constant (SRC) of 0.902, Pearson correlation constant (PCC) of 0.960, and root mean squared error (RMSE) of 0.206. Conversely, on the NOIZEUS-960 database, the proposed methodology shows an SRC of 0.958, PCC of 0.960, and RMSE of 0.114.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。