<i>K</i>-Means Clustering with Local Distance Privacy

Mengmeng Yang; Longxia Huang; Chenghua Tang

doi:10.26599/BDMA.2022.9020050

Big Data Mining and Analytics 2023, 6(4): 433-442 https://doi.org/10.26599/BDMA.2022.9020050

Open Access | Issue | Published: 29 August 2023

K-Means Clustering with Local Distance Privacy

Show Author's Information Hide Author's Information Mengmeng Yang^¹, Longxia Huang^², Chenghua Tang^³(

)

1 Data61, Commonwealth Scientific and Industrial Research Organization, Melbourne 3168, Australia

2 School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

3 School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541010, China

Keywords:

data analysis, K-means clustering, local differential privacy

Cite this article:

Yang M, Huang L, Tang C. K-Means Clustering with Local Distance Privacy. Big Data Mining and Analytics, 2023, 6(4): 433-442. https://doi.org/10.26599/BDMA.2022.9020050

Download citation

EndNote(RIS)

BibTeX

782

Views

302

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. $K$ -means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users’ data, which disclose users’ privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing $K$ -means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users’ records in-distinguish from each other in high-dimensional space, we map the user’s record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local $d_{E}$ -privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.

Full text

Abstract

Full text

Outline

About this article

K-Means Clustering with Local Distance Privacy

Show Author's information Hide Author's Information Mengmeng Yang^¹, Longxia Huang^², Chenghua Tang^³(

)

1 Data61, Commonwealth Scientific and Industrial Research Organization, Melbourne 3168, Australia

2 School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

3 School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541010, China

Abstract

Keywords: data analysis, K-means clustering, local differential privacy

References(27)

[1]

G. Guo and C. Altrjman, E-commerce customer segmentation method under improved k-means algorithm, in Proc. 4^th International Conference on Multi-Modal Information Analytics, Hohhot, China, 2022, pp. 1083–1089.

DOI Google Scholar

[2]

M. Jebakumari, T. Palaniraja, K. A. Patrick, and Ashwini , Blocking of spam mail using k-means clustering algorithm, International Journal of Information Technology and Computer Engineering (IJITC), vol. 2, no. 3, pp. 19–24, 2022.

DOI Google Scholar

[3]

M. Dhalaria and E. Gandotra, Android malware detection techniques: A literature review, Recent Patents on Engineering, vol. 15, no. 2, pp. 225–245, 2021.

DOI Google Scholar

[4]

Y. Qu, J. Zhang, R. Li, X. Zhang, X. Zhai, and S. Yu, Generative adversarial networks enhanced location privacy in 5G networks, Science China Information Sciences, vol. 63, no. 12, p. 220303, 2020.

DOI Google Scholar

[5]

Y. Qu, S. Yu, W. Zhou, S. Chen, and J. Wu, Customizable reliable privacy-preserving data sharing in cyber-physical social networks, IEEE Transactions on Network Science and Engineering, vol. 8, no. 1, pp. 269–281, 2020.

DOI Google Scholar

[6]

Y. Qu, L. Gao, Y. Xiang, S. Shen, and S. Yu, FedTwin: Blockchain-enabled adaptive asynchronous federated learning for digital twin networks, IEEE Network, vol. 36, no. 6, pp. 183–190, 2022.

DOI Google Scholar

[7]

U. Erlingsson, V. Pihur, and A. Korolova, RAPPOR: Randomized aggregatable privacy-preserving ordinal response, in Proc. 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 2014, pp. 1054–1067.

DOI Google Scholar

[8]

Apple differential privacy technical overview, https://www.apple.com/privacy/docs/Differential\_Privacy\_Overview.pdf, 2022.

[9]

B. Ding, J. Kulkarni, and S. Yekhanin, Collecting telemetry data privately, in Proc. 31^st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 2017, pp. 3574–3583.

Google Scholar

[10]

C. Dwork and A. Roth, The algorithmic foundations of differential privacy, Foundations and Trends in Theoretical Computer Science, vol. 9, nos. 3&4, pp. 211–407, 2014.

DOI Google Scholar

[11]

S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, What can we learn privately? SIAM Journal on Computing, vol. 40, no. 3, pp. 793–826, 2011.

DOI Google Scholar

[12]

M. E. Andrés, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi, Geo-indistinguishability: Differential privacy for location-based systems, in Proc. 2013 ACM SIGSAC Conference on Computer and Communications Security, Berlin, Germany, 2013, pp. 901–914.

DOI Google Scholar

[13]

N. Wang, X. Xiao, Y. Yang, J. Zhao, S. C. Hui, H. Shin, J. Shin, and G. Yu, Collecting and analyzing multidimensional data with local differential privacy, in Proc. 2019 IEEE 35^th International Conference on Data Engineering (ICDE), Macao, China, 2019, pp. 638–649.

DOI Google Scholar

[14]

J. C. Duchi, M. I. Jordan, and M. J. Wainwright, Minimax optimal procedures for locally private estimation, Journal of the American Statistical Association, vol. 113, no. 521, pp. 182–201, 2018.

DOI Google Scholar

[15]

A. Blum, C. Dwork, F. McSherry, and K. Nissim, Practical privacy: The SuLQ framework, in Proc. Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, MD, USA, 2005, pp. 128–138.

DOI Google Scholar

[16]

T. Ni, M. Qiao, Z. Chen, S. Zhang, and H. Zhong, Utility-efficient differentially private k-means clustering based on cluster merging, Neurocomputing, vol. 424, pp. 205–214, 2021.

DOI Google Scholar

[17]

A. Smith, Privacy-preserving statistical estimation with optimal convergence rates, in Proc. 43^rd Annual ACM Symposium on Theory of Computing, San Jose, CA, USA, 2011, pp. 813–822.

DOI Google Scholar

[18]

J. Zhang, X. Xiao, Y. Yang, Z. Zhang, and M. Winslett, PrivGene: Differentially private model fitting using genetic algorithms, in Proc. 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, 2013, pp. 665–676.

DOI Google Scholar

[19]

Z. Lu and H. Shen, A convergent differentially private k-means clustering algorithm, in Proc. 23^rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Macao, China, 2019, pp. 612–624.

DOI Google Scholar

[20]

D. Su, J. Cao, N. Li, E. Bertino, M. Lyu, and H. Jin, Differentially private k-means clustering and a hybrid approach to private optimization, ACM Transactions on Privacy and Security, vol. 20, no. 4, pp. 1–33, 2017.

DOI Google Scholar

[21]

M. Yang, I. Tjuawinata, and K. Y. Lam, K-means clustering with local dx-privacy for privacy-preserving data analysis, IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2524–2537, 2022.

DOI Google Scholar

[22]

M. Yang, L. Lyu, J. Zhao, T. Zhu, and K. Y. Lam, Local differential privacy and its applications: A comprehensive survey, arXiv preprint arXiv: 2008.03686, 2020.

Google Scholar

[23]

K. Nissim and U. Stemmer, Clustering algorithms for the centralized and local models, Proceedings of Algorithmic Learning Theory, vol. 83, pp. 619–653, 2018.

Google Scholar

[24]

H. Kaplan and U. Stemmer, Differentially private k-means with constant multiplicative error, in Proc. 32^nd International Conference on Neural Information Processing Systems (NeurIPS), Montréal, Canada, 2018, pp. 5436–5446.

Google Scholar

[25]

U. Stemmer, Locally private k-means clustering, in Proc. Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, Salt Lake City, UT, USA, 2020, pp. 548–559.

DOI Google Scholar

[26]

C. Xia, J. Hua, W. Tong, and S. Zhong, Distributed k-means clustering guaranteeing local differential privacy, Computers & Security, vol. 90, p. 101699, 2020.

DOI Google Scholar

[27]

A. Chang, B. Ghazi, R. Kumar, and P. Manurangsi, Locally private k-means in one round, arXiv preprint arXiv: 2104.09734, 2021.

Google Scholar

About this article

Publication history

Rights and permissions

Publication history

Received: 01 August 2022

Revised: 10 October 2022

Accepted: 03 December 2022

Published: 29 August 2023

Issue date: December 2023

Copyright

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).