Differential Privacy Preserving Voice Conversion for Audio Health Data

用于音频健康数据的差分隐私保护语音转换

阅读:1

Abstract

BACKGROUND: Speech is a predominant mode of human communication. Speech digital recordings are inexpensive to record and contain rich health related information. Deep learning, a key method, excels in detecting intricate patterns, however, it requires substantial training data. Laboratories have invested significantly to gather extensive digital voice datasets for health insights. The challenge lies in securely sharing this data while protecting the speaker's privacy. METHOD: We applied a Generative Adversarial Network (GAN) approach. GANs can generate a voice that closely resembles a real voice and is composed of four key components: (i) Generator, (ii) Formant Extractor, (iii) Speaker Embedding Extractor, and (iv) Discriminator. Model training involves leveraging adversarial loss, wherein the generator strives to produce a voice that convincingly mimics reality to deceive the discriminator. Simultaneously, the discriminator endeavors to discern whether the generated voice is authentic or synthetic. Eventually the generator produces a voice that resembles closely to the real voice. RESULT: We performed zero‐shot voice conversion using an emotion preserving GAN. The model preserves the fundamental frequency trajectories. This is one of the most important features in dementia classification. CONCLUSION: By successfully altering voice prints while preserving sound quality, our initial findings suggest a way for sharing raw digital voice recordings. Future efforts will extend beyond preserving intonation patterns, focusing on preserving additional dementia related markers in the original signal.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。