Journal Home > Volume 27 , Issue 2

Graph data publication has been considered as an important step for data analysis and mining. Graph data, which provide knowledge on interactions among entities, can be locally generated and held by distributed data owners. These data are usually sensitive and private, because they may be related to owners’ personal activities and can be hijacked by adversaries to conduct inference attacks. Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners. Therefore, this work proposes a novel framework for distributed graph publication. In this framework, differential privacy is applied to justify the safety of the published contents. It includes four phases, i.e., graph combination, plan construction sharing, data perturbation, and graph reconstruction. The published graph selection is guided by one data coordinator, and each graph is perturbed carefully with the Laplace mechanism. The problem of graph selection is formulated and proven to be NP-complete. Then, a heuristic algorithm is proposed for selection. The correctness of the combined graph and the differential privacy on all edges are analyzed. This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.


menu
Abstract
Full text
Outline
About this article

Efficient Publication of Distributed and Overlapping Graph Data Under Differential Privacy

Show Author's information Xu ZhengLizong Zhang( )Kaiyang LiXi Zeng
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
China Electronics Technology Cyber Security Co. Ltd., Chengdu 610000, China

Abstract

Graph data publication has been considered as an important step for data analysis and mining. Graph data, which provide knowledge on interactions among entities, can be locally generated and held by distributed data owners. These data are usually sensitive and private, because they may be related to owners’ personal activities and can be hijacked by adversaries to conduct inference attacks. Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners. Therefore, this work proposes a novel framework for distributed graph publication. In this framework, differential privacy is applied to justify the safety of the published contents. It includes four phases, i.e., graph combination, plan construction sharing, data perturbation, and graph reconstruction. The published graph selection is guided by one data coordinator, and each graph is perturbed carefully with the Laplace mechanism. The problem of graph selection is formulated and proven to be NP-complete. Then, a heuristic algorithm is proposed for selection. The correctness of the combined graph and the differential privacy on all edges are analyzed. This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.

Keywords: differential privacy, graph data, distributed data publication

References(36)

[1]
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining and Analytics, vol. 3, no. 2, pp. 85-101, 2020.
[2]
U. Kang and C. Faloutsos, Big graph mining: Algorithms and discoveries, ACM SIGKDD Explorat. Newsl., vol. 14, no. 2, pp. 29-36, 2013.
[3]
J. Wu and N. Wang, Approximating special social influence maximization problems, Tsinghua Science and Technology, vol. 25, no. 6, pp. 703-711, 2020.
[4]
Q. X. Hou, M. Han, and Z. P. Cai, Survey on data analysis in social media: A practical application aspect, Big Data Science and Technology, vol. 3, no. 4, pp. 259-279, 2020.
[5]
M. Togninalli, E. Ghisu, F. Llinares-López, B. Rieck, and K. Borgwardt, Wasserstein Weisfeiler-Lehman graph kernels, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 6439-6449.
[6]
L. Y. Liu, B. C. Yu, M. Han, S. S. Yuan, and N. Wang, Mild cognitive impairment understanding: An empirical study by data-driven approach, BMC Bioinformatics, vol. 20, no. 15, p. 481, 2019.
[7]
H. Kim, A. Bhattacharyya, and K. Anyanwu, Semantic query transformations for increased parallelization in distributed knowledge graph query processing, in Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 2019, pp. 1-14.
DOI
[8]
K. Ueta, X. Y. Xue, Y. Nakamoto, and S. Murakami, A distributed graph database for the data management of IoT systems, in Proc. 2016 IEEE Int. Conf. Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Chengdu, China, 2016, pp. 299-304.
DOI
[9]
Z. P. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 2, pp. 766-775, 2020.
[10]
K. Al Fararni, F. Nafis, B. Aghoutane, A. Yahyaouy, J. Riffi, and A. Sabri, Hybrid recommender system for tourism based on big data and AI: A conceptual framework, Big Data Mining and Analytics, vol. 4, no. 1, pp. 47-55, 2021.
[11]
J. Li, X. Pei, X. J. Wang, D. Y. Yao, Y. Zhang, and Y. Yue, Transportation mode identification with GPS trajectory data and GIS information, Tsinghua Science and Technology, vol. 26, no. 4, pp. 403-416, 2021.
[12]
X. Y. Li, C. H. Zhang, T. Jung, J. W. Qian, and L. L. Chen, Graph-based privacy-preserving data publication, in Proc. 35th Ann. IEEE Int. Conf. Computer Communications, San Francisco, CA, USA, 2016, pp. 1-9.
DOI
[13]
W. Y. Day, N. H. Li, and M. Lyu, Publishing graph degree distribution with node differential privacy, in Proc. 2016 Int. Conf. Management of Data, San Francisco, CA, USA, 2016, pp. 123-138.
DOI
[14]
Z. B. He and J. X. Zhou, Inference attacks on genomic data based on probabilistic graphical models, Big Data Mining and Analytics, vol. 3, no. 3, pp. 225-233, 2020.
[15]
Z. P. Cai and Z. B. He, Trading private range counting over big IoT data, in Proc. 39th Int. Conf. on Distributed Computing Systems, Dallas, TX, USA, 2019, pp. 144-153.
DOI
[16]
L. Muñoz-González, D. Sgandurra, A. Paudice, and E. C. Lupu, Efficient attack graph analysis through approximate inference, ACM Transactions on Privacy and Security, vol. 20, no. 3, pp. 1-30, 2017.
[17]
J. Li, M. Siddula, X. Z. Cheng, W. Cheng, Z. Tian, and Y. S. Li, Approximate data aggregation in sensor equipped IoT networks, Tsinghua Science and Technology, vol. 25, no. 1, pp. 44-55, 2020.
[18]
K. Liu and E. Terzi, Towards identity anonymization on graphs, in Proc. 2008 ACM SIGMOD Int. Conf. Management of Data, Vancouver, Canada, 2008, pp. 93-106.
DOI
[19]
J. B. Wang, Z. P. Cai, and J. G. Yu, Achieving personalized k-anonymity-based content privacy for autonomous vehicles in CPS, IEEE Trans. Ind. Inform., vol. 16, no. 6, pp. 4242-4251, 2020.
[20]
C. Dwork, Differential privacy: A survey of results, in Proc. 5th Int. Conf. Theory and Applications of Models of Computation, Xi’an, China, 2008, pp. 1-19.
DOI
[21]
Z. Qin, T. Yu, Y. Yang, I. Khalil, X. K. Xiao, and K. Ren, Generating synthetic decentralized social graphs with local differential privacy, in Proc. ACM SIGSAC Conf. Computer and Communications Security, Dallas, TX, USA, 2017, pp. 425-438.
DOI
[22]
R. Bassily and A. Smith, Local, private, efficient protocols for succinct histograms, in Proc. 47th Ann. ACM Symp. Theory of Computing, Portland, OR, USA, 2015, pp. 127-135.
DOI
[23]
X. Zheng and Z. P. Cai, Privacy-preserved data sharing towards multiple parties in industrial IoTs, IEEE J. Sel. Areas Commun., vol. 38, no. 5, pp. 968-979, 2020.
[24]
H. P. Sun, X. K. Xiao, I. Khalil, Y. Yang, Z. Qin, H. Wang, and T. Yu, Analyzing subgraph statistics from extended local views with decentralized differential privacy, in Proc. 2019 ACM SIGSAC Conf. Computer and Communications Security, London, UK, 2019, pp. 703-717.
DOI
[25]
Y. X. Zhang, J. H. Wei, X. J. Zhang, X. X. Hu, and W. F. Liu, A two-phase algorithm for generating synthetic graph under local differential privacy, in Proc. 8th Int. Conf. Communication and Network Security, Qingdao, China, 2018, pp. 84-89.
DOI
[26]
T. C. Gao, F. Li, Y. Chen, and X. K. Zou, Local differential privately anonymizing online social networks under HRG-based model, IIEEE Trans. Comput. Soc. Syst., vol. 5, no. 4, pp. 1009-1020, 2018.
[27]
J. Zhang, C. K. Wang, J. M. Wang, J. X. Yu, J. Chen, and C. P. Wang, Inferring directions of undirected social ties, IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3276-3292, 2016.
[28]
C. K. Wang, C. P. Wang, Z. Wang, X. J. Ye, J. X. Yu, and B. Wang, DeepDirect: Learning directions of social ties with edge-based network embedding, IEEE Trans. Knowl. Data Eng., vol. 31, no. 12, pp. 2277-2291, 2019.
[29]
X. Zheng, G. C. Luo, and Z. P. Cai, A fair mechanism for private data publication in online social networks, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 2, pp. 880-891, 2020.
[30]
T. C. Gao and F. Li, Privacy-preserving sketching for online social network data publication, in Proc. 16th Ann. IEEE Int. Conf. Sensing, Communication, and Networking, Boston, MA, USA, 2019, pp. 1-9.
DOI
[31]
T. L. Guo, J. Z. Luo, K. Dong, and M. Yang, Differentially private graph-link analysis based social recommendation, Inf. Sci., vols. 463&464, pp. 214-226, 2018.
[32]
C. Zhang, L. H. Zhu, C. Xu, K. Sharif, C. Zhang, and X. M. Liu, PGAS: Privacy-preserving graph encryption for accurate constrained shortest distance queries, Inf. Sci., vol. 506, pp. 325-345, 2020.
[33]
D. P. Xu, S. H. Yuan, X. T. Wu, and H. Phan, DPNE: Differentially private network embedding, in Proc. 22nd Pacific-Asia Conf. Knowledge Discovery and Data Mining, Melbourne, Australia, 2018, pp. 235-246.
DOI
[34]
S. Zhang and W. W. Ni, Graph embedding matrix sharing with differential privacy, IEEE Access, vol. 7, pp. 89390-89399, 2019.
[35]
Z. P. Cai, X. Zheng, and J. G. Yu, A differential-private framework for urban traffic flows estimation via taxi companies, IEEE Trans. Ind. Inform., vol. 15, no. 12, pp. 6492-6499, 2019.
[36]
F. McSherry and I. Mironov, Differentially private recommender systems: Building privacy into the netflix prize contenders, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 627-636.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 26 December 2020
Revised: 19 February 2021
Accepted: 26 February 2021
Published: 29 September 2021
Issue date: April 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. U19A2059 and 61802050) and Ministry of Science and Technology of Sichuan Province Program (Nos. 2021YFG0018 and 20ZDYF0343).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return