MOTIVATION: The rapid growth in of electronic medical records provide immense potential to researchers, but are often silo-ed at separate hospitals. As a result, federated networks have arisen, which allow simultaneously querying medical databases at a group of connected institutions. The most basic such query is the aggregate count-e.g. How many patients have diabetes? However, depending on the protocol used to estimate that total, there is always a tradeoff in the accuracy of the estimate against the risk of leaking confidential data. Prior work has shown that it is possible to empirically control that tradeoff by using the HyperLogLog (HLL) probabilistic sketch. RESULTS: In this article, we prove complementary theoretical bounds on the k-anonymity privacy risk of using HLL sketches, as well as exhibit code to efficiently compute those bounds. AVAILABILITY AND IMPLEMENTATION: https://github.com/tzyRachel/K-anonymity-Expectation.
Expected 10-anonymity of HyperLogLog sketches for federated queries of clinical data repositories.
阅读:4
作者:Tao Ziye, Weber Griffin M, Yu Yun William
| 期刊: | Bioinformatics | 影响因子: | 5.400 |
| 时间: | 2021 | 起止号: | 2021 Jul 12; 37(Suppl_1):i151-i160 |
| doi: | 10.1093/bioinformatics/btab292 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
