IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Lu Yang; Xingshu Chen; Yonggang Luo; Xiao Lan; Wei Wang

doi:10.26599/TST.2020.9010031

Tsinghua Science and Technology 2022, 27(1): 127-140 https://doi.org/10.26599/TST.2020.9010031

Open Access | Issue | Published: 17 August 2021

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Show Author's Information Hide Author's Information Lu Yang, Xingshu Chen, Yonggang Luo(

), Xiao Lan, Wei Wang

College of Computer Science, Sichuan University, Chengdu 610065, China

School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China

Cyber Science Research Institute, Sichuan University, Chengdu 610065, China

Keywords:

utility, anonymization, generalization, incomplete data streams, privacy preservation

Cite this article:

Yang L, Chen X, Luo Y, et al. IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization. Tsinghua Science and Technology, 2022, 27(1): 127-140. https://doi.org/10.26599/TST.2020.9010031

Download citation

EndNote(RIS)

BibTeX

603

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams. However, the development of most privacy preservation methods does not consider missing values. A few researches allow them to participate in data anonymization but introduce extra considerable information loss. To balance the utility and privacy preservation of incomplete data streams, we present a utility-enhanced approach for Incomplete Data strEam Anonymization (IDEA). In this approach, a slide-window-based processing framework is introduced to anonymize data streams continuously, in which each tuple can be output with clustering or anonymized clusters. We consider the dimensions of attribute and tuple as the similarity measurement, which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss. To avoid the missing value pollution, we propose a generalization method that is based on maybe match for generalizing incomplete data. The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.

Full text

Abstract

Full text

Outline

About this article

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Show Author's information Hide Author's Information Lu Yang, Xingshu Chen, Yonggang Luo(

), Xiao Lan, Wei Wang

College of Computer Science, Sichuan University, Chengdu 610065, China

School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China

Cyber Science Research Institute, Sichuan University, Chengdu 610065, China

Abstract

Keywords: utility, anonymization, generalization, incomplete data streams, privacy preservation

References(46)

[1]

J. Gama, Knowledge Discovery From Data Streams. Boca Raton, FL, USA: Chapman & Hall/CRC Press, 2010.

[2]

X. Zeng, X. Chen, G. Shao, T. He, and L. Wang, DTA-HOC: Online https traffic service identification using DNS in large-scale networks, Tsinghua Science and Technology, vol. 25, no. 2, pp. 239-254, 2020.

DOI Google Scholar

[3]

S. Yu, Big privacy: Challenges and opportunities of privacy study in the age of big data, IEEE Access, vol. 4, pp. 2751-2763, 2016.

DOI Google Scholar

[4]

K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung, Privacy-preserving trajectory stream publishing, Data and Knowledge Engineering, vol. 94, pp. 89-109, 2014.

DOI Google Scholar

[5]

Z. Pervaiz, A. Ghafoor, and W. G. Aref, Precision-bounded access control using sliding-window query views for privacy-preserving data streams, IEEE Trans. Knowl. Data Eng., vol. 27, no. 7, pp. 1992-2004, 2015.

DOI Google Scholar

[6]

S. Liu, Q. Qu, L. Chen, and L. M. Ni, SMC: A practical schema for privacy-preserved data sharing over distributed data streams, IEEE Transactions on Big Data, vol. 1, no. 2, pp. 68-81, 2015.

DOI Google Scholar

[7]

X. Chen, L. Yang, and Y. Luo, Big data security technology, Advanced Engineering Sciences, vol. 49, no. 5, pp. 1-12, 2017.

Google Scholar

[8]

S. A. Abdelhameed, S. M. Moussa, and M. E. Khalifa, Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud, Computers & Security, vol. 72, pp. 74-95, 2018.

DOI Google Scholar

[9]

L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, Information security in big data: Privacy and data mining, IEEE Access, vol. 2, pp. 1149-1176, 2014.

DOI Google Scholar

[10]

L. Sweeney, K-anonymity: A model for protecting privacy, International Journal of Uncertainty, Puzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.

DOI Google Scholar

[11]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, L-diversity: Privacy beyond k-anonymity, TKDD, vol. 1, no. 1, p. 3, 2007.

DOI Google Scholar

[12]

N. Li, T. Li, and S. Venkatasubramanian, T-closeness: Privacy beyond k-anonymity and l-diversity, in Proceedings of the 23rd International Conference on Data Engineering Istanbul, Turkey, 2007, pp. 106-115.

DOI

[13]

S. Yaseen, S. M. A. Abbas, A. Anjum, T. Saba, A. Khan, S. U. R. Malik, N. Ahmad, B. Shahzad, and A. K. Bashir, Improved generalization for secure data publishing, IEEE Access, vol. 6, pp. 27156-27165, 2018.

DOI Google Scholar

[14]

X. Huang, J. Liu, Z. Han, and J. Yang, A new anonymity model for privacy-preserving data publishing, China Communications, vol. 11, no. 9, pp. 47-59, 2014.

DOI Google Scholar

[15]

X. He, Y. Xiao, Y. Li, Q. Wang, W. Wang, and B. Shi, Permutation anonymization: Improving anatomy for privacy preservation in data publication, in Proc. of New Frontiers in Applied Data Mining-PAKDD 2011 International Workshops, Shenzhen, China, 2011, pp. 111-123.

DOI

[16]

Q. Wei, Y. Lu, and Q. Lou, Privacy-preserving data publishing based on de-clustering, in Proc. of 7th IEEE/ACIS International Conference on Computer and Information Science, Portland, OR, USA, 2008, pp. 152-157.

DOI

[17]

Q. Gong, M. Yang, and J. Luo, Data anonymization approach for incomplete microdata, Journal of Software, vol. 24, no. 12, pp. 2883-2896, 2013.

DOI Google Scholar

[18]

J. Tekli, B. al Bouna, Y. B. Issa, M. Kamradt, and R. A. Haraty, (k, l)-clustering for transactional data streams anonymization, in Proc. of Information Security Practice and Experience-14th International Conference, Tokyo, Japan, 2018, pp. 544-556.

DOI

[19]

Q. Gong, M. Yang, Z. Chen, W. Wu, and J. Luo, A framework for utility enhanced incomplete microdata anonymization, Cluster Computing, vol. 20, no. 2, pp. 1749-1764, 2017.

DOI Google Scholar

[20]

W. Wang, J. Li, C. Ai, and Y. Li, Privacy protection on sliding window of data streams, in Proc. of 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), New York, NY, USA, 2007, pp. 213-221.

DOI

[21]

J. Zhang, J. Yang, J. Zhang, and Y. Yuan, Kids: k-anonymization data stream base on sliding window, in Proc. of 2010 2nd International Conference on Future Computer and Communication, Shanghai, China, 2010, pp. V2-311-V2-316.

DOI

[22]

J. Li, B. C. Ooi, and W. Wang, Anonymizing streaming data for privacy protection, in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, Cancún, Mexico, 2008, pp. 1367-1369.

DOI

[23]

J. J. V. Nayahi and V. Kavitha, Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop, Future Generation Computer Systems, vol. 74, pp. 393-408, 2017.

DOI Google Scholar

[24]

H. Chhinkaniwala and S. Garg, Tuple value based multiplicative data perturbation approach to preserve privacy in data stream mining, ..

DOI

[25]

C. Jianneng, C. Barbara, F. Elena, and T. Kian-Lee, CASTLE: A delay-constrained scheme for ks-anonymizing data streams, in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, Cancún, Mexico, 2008, pp. 1376-1378.

[26]

P. Wang, J. Lu, L. Zhao, and J. Yang, B-CASTLE: An efficient publishing algorithm for k-anonymizing data streams, in Proc. of 2010 2nd WRI Global Congress on Intelligent Systems, GCIS 2010, Wuhan, China, 2010, pp. 132-136.

DOI

[27]

H. Zakerzadeh and S. L. Osborn, FAANST: Fast anonymizing algorithm for numerical streaming data, in Proceedings of the 5th International Workshop on Data Privacy Management, and 3rd International Conference on Autonomous Spontaneous Security, Athens, Greece, 2011, pp. 36-50.

DOI

[28]

H. Zakerzadeh and S. L. Osborn, Delay-sensitive approaches for anonymizing numerical streaming data, International Journal of Information Security, vol. 12, no. 5, pp. 423-437, 2013.

DOI Google Scholar

[29]

K. Guo and Q. Zhang, Fast clustering-based anonymization approaches with time constraints for data streams, Knowledge-Based Systems, vol. 46, pp. 95-108, 2013.

DOI Google Scholar

[30]

G. Yang, J. Yang, J. Zhang, and Y. Chu, Research on data streams publishing of privacy preserving, in Proc. of 2010 IEEE International Conference on Information Theory and Information Security, Beijing, China, 2010, pp. 199-202.

[31]

J. Xie, J. Zhang, J. Yang, and B. Zhang, Anonymization algorithm based on time density for data stream, Journal on Communications, vol. 35, no. 11, pp. 191-198, 2014.

Google Scholar

[32]

A. B. Sakpere and A. V. D. M. Kayem, Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss, in Proc. of 2015 International Conference on Information Systems Security and Privacy (ICISSP), Loire Valley, France, 2015, pp. 1-11.

[33]

S. A. Abdelhameed, S. M. Moussa, and M. E. Khalifa, Restricted sensitive attributes-based sequential anonymization (RSA-SA) approach for privacy-preserving data stream publishing, Knowl.-Based Syst., vol. 164, pp. 1-20, 2019.

DOI Google Scholar

[34]

J. Wang, C. Deng, and X. Li, Two privacy-preserving approaches for publishing transactional data streams, IEEE Access, vol. 6, pp. 23648-23658, 2018.

DOI Google Scholar

[35]

J. Zhang, H. Li, X. Liu, Y. Luo, F. Chen, H. Wang, and L. Chang, On efficient and robust anonymization for privacy protection on massive streaming categorical information, IEEE Trans. Dependable Sec. Comput., vol. 14, no. 5, pp. 507-520, 2017.

DOI Google Scholar

[36]

A. Otgonbayar, Z. Pervez, and K. Dahal, Toward anonymizing IoT data streams via partitioning, in Proc. of 2016 IEEE 13th International Conference on Mobile Ad Hoc and Sensor Systems, Brasilia, Brazil, 2016, pp. 331-336.

DOI

[37]

A. Otgonbayar, Z. Pervez, K. P. Dahal, and S. Eager, K-VARP: k-anonymity for varied data streams via partitioning, Inf. Sci., vol. 467, pp. 238-255, 2018.

DOI Google Scholar

[38]

P. Jaccard, The distribution of the flora in the alpine zone, New Phytologist, vol. 11, no. 2, pp. 37-50, 1912.

DOI Google Scholar

[39]

V. S. Iyengar, Transforming data to satisfy privacy constraints, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002, pp. 279-288.

DOI

[40]

R. van der Meyden, Logical Approaches to Incomplete Information: A Survey. Boston, MA, USA: Springer, 1998.

DOI

[41]

M. Ciglic, J. Eder, and C. Koncilia, K-anonymity of microdata with NULL values, in Proc. of International Conference on Database and Expert Systems Applications, Cham, Switzerland, pp. 328-342, 2014.

DOI

[42]

U. M. L. Repository, Adult data set, https://archive.ics.uci.edu/ml/datasets/Adult, 2020.

[43]

INFORMS data set, https://sites.google.com/site/informsdataminingcontest/, 2020.

[44]

J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu, Utility-based anonymization using local recoding, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 785-790.

DOI

[45]

G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, Fast data anonymization with low information loss, in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 758-769.

[46]

K. Guo and Q. Zhang, Fast clustering-based anonymization algorithm for data treams, Journal of Software, vol. 24, no. 8, pp. 1852-1867, 2013.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 14 July 2020

Revised: 26 August 2020

Accepted: 01 September 2020

Published: 17 August 2021

Issue date: February 2022

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. U19A2081 and 61802270), and the Fundamental Research Funds for the Central Universities (No. 2020SCUNG129).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).