Journal Home > Volume 27 , Issue 1

The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams. However, the development of most privacy preservation methods does not consider missing values. A few researches allow them to participate in data anonymization but introduce extra considerable information loss. To balance the utility and privacy preservation of incomplete data streams, we present a utility-enhanced approach for Incomplete Data strEam Anonymization (IDEA). In this approach, a slide-window-based processing framework is introduced to anonymize data streams continuously, in which each tuple can be output with clustering or anonymized clusters. We consider the dimensions of attribute and tuple as the similarity measurement, which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss. To avoid the missing value pollution, we propose a generalization method that is based on maybe match for generalizing incomplete data. The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.


menu
Abstract
Full text
Outline
About this article

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Show Author's information Lu YangXingshu ChenYonggang Luo( )Xiao LanWei Wang
College of Computer Science, Sichuan University, Chengdu 610065, China
School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
Cyber Science Research Institute, Sichuan University, Chengdu 610065, China

Abstract

The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams. However, the development of most privacy preservation methods does not consider missing values. A few researches allow them to participate in data anonymization but introduce extra considerable information loss. To balance the utility and privacy preservation of incomplete data streams, we present a utility-enhanced approach for Incomplete Data strEam Anonymization (IDEA). In this approach, a slide-window-based processing framework is introduced to anonymize data streams continuously, in which each tuple can be output with clustering or anonymized clusters. We consider the dimensions of attribute and tuple as the similarity measurement, which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss. To avoid the missing value pollution, we propose a generalization method that is based on maybe match for generalizing incomplete data. The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.

Keywords: utility, anonymization, generalization, incomplete data streams, privacy preservation

References(46)

[1]
J. Gama, Knowledge Discovery From Data Streams. Boca Raton, FL, USA: Chapman & Hall/CRC Press, 2010.
[2]
X. Zeng, X. Chen, G. Shao, T. He, and L. Wang, DTA-HOC: Online https traffic service identification using DNS in large-scale networks, Tsinghua Science and Technology, vol. 25, no. 2, pp. 239-254, 2020.
[3]
S. Yu, Big privacy: Challenges and opportunities of privacy study in the age of big data, IEEE Access, vol. 4, pp. 2751-2763, 2016.
[4]
K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung, Privacy-preserving trajectory stream publishing, Data and Knowledge Engineering, vol. 94, pp. 89-109, 2014.
[5]
Z. Pervaiz, A. Ghafoor, and W. G. Aref, Precision-bounded access control using sliding-window query views for privacy-preserving data streams, IEEE Trans. Knowl. Data Eng., vol. 27, no. 7, pp. 1992-2004, 2015.
[6]
S. Liu, Q. Qu, L. Chen, and L. M. Ni, SMC: A practical schema for privacy-preserved data sharing over distributed data streams, IEEE Transactions on Big Data, vol. 1, no. 2, pp. 68-81, 2015.
[7]
X. Chen, L. Yang, and Y. Luo, Big data security technology, Advanced Engineering Sciences, vol. 49, no. 5, pp. 1-12, 2017.
[8]
S. A. Abdelhameed, S. M. Moussa, and M. E. Khalifa, Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud, Computers & Security, vol. 72, pp. 74-95, 2018.
[9]
L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, Information security in big data: Privacy and data mining, IEEE Access, vol. 2, pp. 1149-1176, 2014.
[10]
L. Sweeney, K-anonymity: A model for protecting privacy, International Journal of Uncertainty, Puzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[11]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, L-diversity: Privacy beyond k-anonymity, TKDD, vol. 1, no. 1, p. 3, 2007.
[12]
N. Li, T. Li, and S. Venkatasubramanian, T-closeness: Privacy beyond k-anonymity and l-diversity, in Proceedings of the 23rd International Conference on Data Engineering Istanbul, Turkey, 2007, pp. 106-115.
DOI
[13]
S. Yaseen, S. M. A. Abbas, A. Anjum, T. Saba, A. Khan, S. U. R. Malik, N. Ahmad, B. Shahzad, and A. K. Bashir, Improved generalization for secure data publishing, IEEE Access, vol. 6, pp. 27156-27165, 2018.
[14]
X. Huang, J. Liu, Z. Han, and J. Yang, A new anonymity model for privacy-preserving data publishing, China Communications, vol. 11, no. 9, pp. 47-59, 2014.
[15]
X. He, Y. Xiao, Y. Li, Q. Wang, W. Wang, and B. Shi, Permutation anonymization: Improving anatomy for privacy preservation in data publication, in Proc. of New Frontiers in Applied Data Mining-PAKDD 2011 International Workshops, Shenzhen, China, 2011, pp. 111-123.
DOI
[16]
Q. Wei, Y. Lu, and Q. Lou, Privacy-preserving data publishing based on de-clustering, in Proc. of 7th IEEE/ACIS International Conference on Computer and Information Science, Portland, OR, USA, 2008, pp. 152-157.
DOI
[17]
Q. Gong, M. Yang, and J. Luo, Data anonymization approach for incomplete microdata, Journal of Software, vol. 24, no. 12, pp. 2883-2896, 2013.
[18]
J. Tekli, B. al Bouna, Y. B. Issa, M. Kamradt, and R. A. Haraty, (k, l)-clustering for transactional data streams anonymization, in Proc. of Information Security Practice and Experience-14th International Conference, Tokyo, Japan, 2018, pp. 544-556.
DOI
[19]
Q. Gong, M. Yang, Z. Chen, W. Wu, and J. Luo, A framework for utility enhanced incomplete microdata anonymization, Cluster Computing, vol. 20, no. 2, pp. 1749-1764, 2017.
[20]
W. Wang, J. Li, C. Ai, and Y. Li, Privacy protection on sliding window of data streams, in Proc. of 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007), New York, NY, USA, 2007, pp. 213-221.
DOI
[21]
J. Zhang, J. Yang, J. Zhang, and Y. Yuan, Kids: k-anonymization data stream base on sliding window, in Proc. of 2010 2nd International Conference on Future Computer and Communication, Shanghai, China, 2010, pp. V2-311-V2-316.
DOI
[22]
J. Li, B. C. Ooi, and W. Wang, Anonymizing streaming data for privacy protection, in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, Cancún, Mexico, 2008, pp. 1367-1369.
DOI
[23]
J. J. V. Nayahi and V. Kavitha, Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop, Future Generation Computer Systems, vol. 74, pp. 393-408, 2017.
[24]
H. Chhinkaniwala and S. Garg, Tuple value based multiplicative data perturbation approach to preserve privacy in data stream mining, ..
DOI
[25]
C. Jianneng, C. Barbara, F. Elena, and T. Kian-Lee, CASTLE: A delay-constrained scheme for ks-anonymizing data streams, in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop, Cancún, Mexico, 2008, pp. 1376-1378.
[26]
P. Wang, J. Lu, L. Zhao, and J. Yang, B-CASTLE: An efficient publishing algorithm for k-anonymizing data streams, in Proc. of 2010 2nd WRI Global Congress on Intelligent Systems, GCIS 2010, Wuhan, China, 2010, pp. 132-136.
DOI
[27]
H. Zakerzadeh and S. L. Osborn, FAANST: Fast anonymizing algorithm for numerical streaming data, in Proceedings of the 5th International Workshop on Data Privacy Management, and 3rd International Conference on Autonomous Spontaneous Security, Athens, Greece, 2011, pp. 36-50.
DOI
[28]
H. Zakerzadeh and S. L. Osborn, Delay-sensitive approaches for anonymizing numerical streaming data, International Journal of Information Security, vol. 12, no. 5, pp. 423-437, 2013.
[29]
K. Guo and Q. Zhang, Fast clustering-based anonymization approaches with time constraints for data streams, Knowledge-Based Systems, vol. 46, pp. 95-108, 2013.
[30]
G. Yang, J. Yang, J. Zhang, and Y. Chu, Research on data streams publishing of privacy preserving, in Proc. of 2010 IEEE International Conference on Information Theory and Information Security, Beijing, China, 2010, pp. 199-202.
[31]
J. Xie, J. Zhang, J. Yang, and B. Zhang, Anonymization algorithm based on time density for data stream, Journal on Communications, vol. 35, no. 11, pp. 191-198, 2014.
[32]
A. B. Sakpere and A. V. D. M. Kayem, Adaptive buffer resizing for efficient anonymization of streaming data with minimal information loss, in Proc. of 2015 International Conference on Information Systems Security and Privacy (ICISSP), Loire Valley, France, 2015, pp. 1-11.
[33]
S. A. Abdelhameed, S. M. Moussa, and M. E. Khalifa, Restricted sensitive attributes-based sequential anonymization (RSA-SA) approach for privacy-preserving data stream publishing, Knowl.-Based Syst., vol. 164, pp. 1-20, 2019.
[34]
J. Wang, C. Deng, and X. Li, Two privacy-preserving approaches for publishing transactional data streams, IEEE Access, vol. 6, pp. 23648-23658, 2018.
[35]
J. Zhang, H. Li, X. Liu, Y. Luo, F. Chen, H. Wang, and L. Chang, On efficient and robust anonymization for privacy protection on massive streaming categorical information, IEEE Trans. Dependable Sec. Comput., vol. 14, no. 5, pp. 507-520, 2017.
[36]
A. Otgonbayar, Z. Pervez, and K. Dahal, Toward anonymizing IoT data streams via partitioning, in Proc. of 2016 IEEE 13th International Conference on Mobile Ad Hoc and Sensor Systems, Brasilia, Brazil, 2016, pp. 331-336.
DOI
[37]
A. Otgonbayar, Z. Pervez, K. P. Dahal, and S. Eager, K-VARP: k-anonymity for varied data streams via partitioning, Inf. Sci., vol. 467, pp. 238-255, 2018.
[38]
P. Jaccard, The distribution of the flora in the alpine zone, New Phytologist, vol. 11, no. 2, pp. 37-50, 1912.
[39]
V. S. Iyengar, Transforming data to satisfy privacy constraints, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002, pp. 279-288.
DOI
[40]
R. van der Meyden, Logical Approaches to Incomplete Information: A Survey. Boston, MA, USA: Springer, 1998.
DOI
[41]
M. Ciglic, J. Eder, and C. Koncilia, K-anonymity of microdata with NULL values, in Proc. of International Conference on Database and Expert Systems Applications, Cham, Switzerland, pp. 328-342, 2014.
DOI
[42]
U. M. L. Repository, Adult data set, https://archive.ics.uci.edu/ml/datasets/Adult, 2020.
[43]
[44]
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu, Utility-based anonymization using local recoding, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 785-790.
DOI
[45]
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, Fast data anonymization with low information loss, in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 758-769.
[46]
K. Guo and Q. Zhang, Fast clustering-based anonymization algorithm for data treams, Journal of Software, vol. 24, no. 8, pp. 1852-1867, 2013.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 14 July 2020
Revised: 26 August 2020
Accepted: 01 September 2020
Published: 17 August 2021
Issue date: February 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. U19A2081 and 61802270), and the Fundamental Research Funds for the Central Universities (No. 2020SCUNG129).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return