Fidelity-agnostic synthetic data generation improves utility while retaining privacy

与保真度无关的合成数据生成方法可在保护隐私的同时提高实用性

阅读:1

Abstract

Synthetic data are a popular method to publish useful datasets in a privacy-aware manner, making them useful across a range of scientific domains involving human subjects. They are typically generated by sampling from algorithms that mimic the probability distribution of real datasets, thereby maximizing statistical similarity to real data. However, we argue and demonstrate that synthetic data need to be similar only in ways relevant to their intended use and may neglect any irrelevant information, which in turn may improve privacy protection. As such, we propose a data synthesis method entitled fidelity-agnostic synthetic data. The method first extracts features relevant to the dataset's intended use using a neural net and then generates synthetic versions of the extracted features, after which they are decoded to mimic the real dataset. We show that our synthetic data improve performance in prediction tasks while retaining privacy protection compared to other state-of-the-art methods.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。