Journal Home > Volume 23 , Issue 4

Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.


menu
Abstract
Full text
Outline
About this article

Achieving Differential Privacy of Genomic Data Releasing via Belief Propagation

Show Author's information Zaobo HeYingshu Li( )Ji LiKaiyang LiQing CaiYi Liang
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

Abstract

Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.

Keywords: differential privacy, SNP/trait associations, belief propagation, factor graph, data releasing

References(25)

[1]
[2]
The NHGRI-EBI Catalog of published genome-wide association studies, https://www.ebi.ac.uk/gwas/docs/about, 2017.
[3]
Disgenet-A database of gene-disease associations, http://www.disgenet.org/web/DisGeNET/menu, 2017.
[4]
D. Cynthia, Differential privacy, in Encyclopedia of Cryptography and Security. Springer, 2011, pp. 338-340.
DOI
[5]
U. Caroline, A. Slavkovic, and S. E. Fienberg, Privacy preserving data sharing for genome-wide association studies, Journal of Privacy and Confidentiality, vol. 5, no. 1, p. 137, 2013.
[6]
S. Wang, N. Mohammed, and R. Chen, Differentially private genome data dissemination through top-down specialization, BMC Medical Informatics and Decision Making, vol. 14, no. S1, p. S2, 2014.
[7]
S. Simmons and B. Berger, Realizing privacy preserving genome-wide association studies, Bioinformatics, vol. 32, no. 9, pp. 1293-1300, 2016.
[8]
A. Johnson and V. Shmatikov, Privacy-preserving data exploration in genome-wide association studies, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 1079-1087.
DOI
[9]
S. Sankararaman, G. Obozinski, M. I. Jordan, and E. Halperin, Genomic privacy and limits of individual detection in a pool, Nature Genetics, vol. 41, no. 9, pp. 965-967, 2009.
[10]
N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V. Pearson, D. A. Stephan, S. F. Nelson, and D. W. Craig, Resolving individuals contributing trace amounts of DNA to highly complex mixtures using highdensity SNP genotyping microarrays, PLoS Genet., vol. 4, no. 8, p. e1000167, 2008.
[11]
Z. Cai, Z. He, X. Guan, and Y. Li, Collective datasanitization for preventing sensitive information inference attacks in social networks, IEEE Transactions on Dependable and Secure Computing, .
[12]
Z. He, Z. Cai, Y. Sun, Y. Li, and X. Cheng, Customized privacy preserving for inherent data and latent data, Personal Ubiquitous Comput., vol. 21, no. 1, pp. 1-12, 2017.
[13]
M. Han, M. Yan, Z. Cai, Y. Li, X. Cai, and J. Yu, Influence maximization by probing partial communities in dynamic online social networks, Transactions on Emerging Telecommunications Technologies, vol. 28, no. 4, 2016.
[14]
M. Han, M. Yan, Z. Cai, and Y. Li, An exploration of broader influence maximization in timeliness networks with opportunistic selection, Journal of Network and Computer Applications, vol. 63, pp. 39-49, 2016.
[15]
Z. He, Z. Cai, and J. Yu, Latent-data privacy preserving with customized data utility for social network data, IEEE Transactions on Vehicular Technology, .
[16]
M. Jiang and Y. Zhang, Perfect domination and small cycles, Discrete Mathematics, Algorithms and Applications, vol. 9, no. 3, 2017.
[17]
M. V. Dhanyamol and S. Mathew, On transit functions in weighted graphs, Discrete Mathematics, Algorithms and Applications, vol. 9, no. 3, 2017.
[18]
L. Zhang, Z. Cai, and X. Wang, Fakemask: A novel privacy preserving approach for smartphones, IEEE Transactions on Network and Service Management, vol. 13, no. 2, pp. 335-348, 2016.
[19]
Z. He, Z. Cai, Q. Han, W. Tong, L. Sun, and Y. Li, An energy efficient privacy-preserving content sharing scheme in mobile social networks, Personal Ubiquitous Comput., vol. 20, no.5, pp. 1-14, 2016.
[20]
X. Zheng, Z. Cai, J. Li, and H. Gao, Location privacy-aware review publication mechanism for local business service systems, in The 36th Annual IEEE International Conference on Computer Communications, 2017.
DOI
[21]
Y. E. Sun, H. Huang, X. Y. Li, Y. Du, M. Tian, H. Xu, and M. Xiao, Privacy-preserving strategy-proof auction mechanisms for resource allocation, Tsinghua Science and Technology, vol. 22, no. 2, pp. 119-134, 2017.
[22]
B. Zhao, Y. Xiao, Y. Huang, and X. Cui, A private user data protection mechanism in trust-zone architecture based on identity authentication, Tsinghua Science and Technology, vol. 22, no. 2, pp. 218-225, 2017.
[23]
X. Zheng, G. Luo, and Z. Cai, A fair mechanism for private data publication in online social networks, IEEE Transactions on Network Science and Engineering, 2018. (Accepted)
[24]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta, Discovering frequent patterns in sensitive data, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2010.
DOI
[25]
M. Humbert, E. Ayday, J.-P. Hubaux, and A. Telenti, Addressing the concerns of the lacks family: Quantification of kin genomic privacy, in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, 2013, pp. 1141-1152.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 21 September 2017
Accepted: 04 November 2017
Published: 16 August 2018
Issue date: August 2018

Copyright

© The authors 2018

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Nos. 61632010 and 61602129).

Rights and permissions

Return