Class Restricted Clustering and Micro-Perturbation for Data Privacy

用于数据隐私的类限制聚类和微扰

阅读:1

Abstract

The extensive use of information technologies by organizations to collect and share personal data has raised strong privacy concerns. To respond to the public's demand for data privacy, a class of clustering-based data masking techniques is increasingly being used for privacy-preserving data sharing and analytics. Traditional clustering-based approaches for masking numeric attributes, while addressing re-identification risks, typically do not consider the disclosure risk of categorical confidential attributes. We propose a new approach to deal with this problem. The proposed method clusters data such that the data points within a group are similar in the non-confidential attribute values whereas the confidential attribute values within a group are well distributed. To accomplish this, the clustering method, which is based on a minimum spanning tree (MST) technique, uses two risk-utility tradeoff measures in the growing and pruning stages of the MST technique respectively. As part of our approach we also propose a novel cluster-level micro-perturbation method for masking data that overcomes a common problem of traditional clustering-based methods for data masking, which is their inability to preserve important statistical properties such as the variance of attributes and the covariance across attributes. We show that the mean vector and the covariance matrix of the masked data generated using the micro-perturbation method are unbiased estimates of the original mean vector and covariance matrix. An experimental study on several real-world datasets demonstrates the effectiveness of the proposed approach.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。