Journal Home > Volume 27 , Issue 2

Recently, local differential privacy (LDP) has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees. Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data. For the first time in the literature, we use LDP for distance estimation between distributed data to support more complicated data analysis. Specifically, we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space. We also present an optimization strategy for reducing privacy leakage in the high-dimensional space. The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments. As study cases, we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering. Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.


menu
Abstract
Full text
Outline
About this article

PrivBV: Distance-Aware Encoding for Distributed Data with Local Differential Privacy

Show Author's information Lin SunGuolou PingXiaojun Ye( )
School of Software, Tsinghua University, Beijing 100084, China

Abstract

Recently, local differential privacy (LDP) has been used as the de facto standard for data sharing and analyzing with high-level privacy guarantees. Existing LDP-based mechanisms mainly focus on learning statistical information about the entire population from sensitive data. For the first time in the literature, we use LDP for distance estimation between distributed data to support more complicated data analysis. Specifically, we propose PrivBV—a locally differentially private bit vector mechanism with a distance-aware property in the anonymized space. We also present an optimization strategy for reducing privacy leakage in the high-dimensional space. The distance-aware property of PrivBV brings new insights into complicated data analysis in distributed environments. As study cases, we show the feasibility of applying PrivBV to privacy-preserving record linkage and non-interactive clustering. Theoretical analysis and experimental results demonstrate the effectiveness of the proposed scheme.

Keywords: local differential privacy, privacy-preserving data publishing, non-interactive clustering

References(25)

[1]
W. Zhang, Z. Li, and X. Chen, Quality-aware user recruitment based on federated learning in mobile crowd sensing, Tsinghua Science and Technology, vol. 26, no. 6, pp. 869-877, 2021.
[2]
J. Y. Hua, G. Yue, and S. Zhong, Differentially private publication of general time-serial trajectory data, in 2015 IEEE Conf. Computer Communications, Hong Kong, China, 2015, pp. 549-557.
[3]
D. Vatsalan and P. Christen, Privacy-preserving matching of similar patients, J. Biomed. Inform., vol. 59, pp. 285-298, 2016.
[4]
D. Karapiperis, A. Gkoulalas-Divanis, and V. S. Verykios, FEDERAL: A framework for distance-aware privacy-preserving record linkage, IEEE Trans. Knowl. Data Eng., vol. 30, no. 2, pp. 292-304, 2018.
[5]
D. Karapiperis, A. Gkoulalas-Divanis, and V. S. Verykios, Distance-aware encoding of numerical values for privacy-preserving record linkage, in 2017 IEEE 33rd Int. Conf. Data Engineering, San Diego, CA, USA, 2017, pp. 135-138.
[6]
Y. Khazbak, J. Fan, S. Zhu and G. Cao, Preserving personalized location privacy in ride-hailing service, Tsinghua Science and Technology, vol. 25, no. 6, pp. 743-757, 2020.
[7]
P. Kairouz, S. Oh, and P. Viswanath, Extremal mechanisms for local differential privacy, J. Mach. Learning Res., vol. 17, no. 1, pp. 492-542, 2016.
[8]
M. Fanaeepour and B. I. P. Rubinstein, Histogramming privately ever after: Differentially-private data-dependent error bound optimisation, in 2018 IEEE 34th Int. Conf. Data Engineering, Paris, France, 2018, pp. 1204-1207.
[9]
R. Bassily and A. Smith, Local, private, efficient protocols for succinct histograms, in Proc. 47th Annu. ACM Symp. Theory of Computing, New York, NY, USA, 2015, pp. 127-135.
[10]
J. Hsu, S. Khanna, and A. Roth, Distributed private heavy hitters, in Int. Colloquium on Automata, Languages, and Programming, A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer, eds. Warwick, UK: Springer, 2012, pp. 461-472.
[11]
G. Cormode, T. Kulkarni, and D. Srivastava, Marginal release under local differential privacy, in Proc. 2018 Int. Conf. Management of Data, New York, NY, USA, 2018, pp. 131-146.
[12]
B. L. Ding, J. Kulkarni, and S. Yekhanin, Collecting telemetry data privately, in Advances in Neural Information Proc. Systems, Long Beach, CA, USA, 2017, pp. 3571-3580.
[13]
C. Dwork and A. Roth, The algorithmic foundations of differential privacy, Found. Trends Theoret. Comput. Sci., vol. 9, no. 3, pp. 211-407, 2014.
[14]
N. Wang, X. K. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin, and G. Yu, Collecting and Analyzing Multidimensional Data with Local Differential Privacy, in 2019 IEEE 35th Int. Conf. Data Engineering (ICDE), Macao, China, 2019, pp. 638-649.
[15]
D. Vatsalan, P. Christen, and V. S. Verykios, A taxonomy of privacy-preserving record linkage techniques, Informat. Syst., vol. 38, no. 6, pp. 946-969, 2013.
[16]
T. H. Wang, J. Blocki, N. H. Li, and S. Jha, Locally differentially private protocols for frequency estimation, in Proc. 26th USENIX Conf. Security Symp., Berkeley, CA, USA, 2017, pp. 729-745.
[17]
C. Dwork, Differential privacy, in Proc. 33rd Int. Conf. Automata, Languages and Programming, Venice, Italy, 2006, pp. 1-12.
[18]
B. Ding, H. Nori, P. Li, and J. Allen, Comparing population means under local differential privacy: With significance and power, in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 26-33.
[19]
T. H. Wang, N. H. Li, and S. Jha, Locally differentially private frequent itemset mining, in Proc. 2018 IEEE Symp. Security and Privacy, San Francisco, CA, USA, 2018, pp. 127-143.
[20]
Ú. Erlingsson, V. Pihur, and A. Korolova, RAPPOR: randomized aggregatable privacy-preserving ordinal response, in Proc. 2014 ACM SIGSAC Conf. Computer and Communications Security, Scottsdale, AZ, USA, 2014, pp. 1054-1067.
[21]
A. Lancichinetti, S. Fortunato, and J. Kertész, Detecting the overlapping and hierarchical community structure in complex networks, New J. Phys., vol. 11, no. 3, pp. 033015, 2009.
[22]
J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, Berkeley, CA, USA, 1967, pp. 281-297.
[23]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: machine learning in python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[24]
S. R. M. Oliveira and O. R. Zaiane, Privacy preserving clustering by data transformation, J. Inf. Data Manag., vol. 1, no. 1, pp. 37-37, 2010.
[25]
K. Liu, H. Kargupta, and J. Ryan, Random projection-based multiplicative data perturbation for privacy preserving distributed data mining, IEEE Trans. Knowl. Data Eng., vol. 18, no. 1, pp. 92-106, 2006.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 26 October 2020
Accepted: 24 March 2021
Published: 29 September 2021
Issue date: April 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was supported by National Key Research and Development Program of China (Nos. 2019QY1402 and 2016YFB0800901).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return