Journal Home > Volume 5 , Issue 3

Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities. It prohibits minable data publication since the published data may contain sensitive information. Thus, it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks. To this end, in this paper, we propose an optimized sanitization approach for minable data publication (named as SA-MDP). SA-MDP supports association rules mining function while providing privacy protection for specific rules. In SA-MDP, we consider the trade-off between the data utility and the data privacy in the minable data publication problem. To address this problem, SA-MDP designs a customized particle swarm optimization (PSO) algorithm, where the optimization objective is determined by both the data utility and the data privacy. Specifically, we take advantage of PSO to produce new particles, which is achieved by random mutation or learning from the best particle. Hence, SA-MDP can avoid the solutions being trapped into local optima. Besides, we design a proper fitness function to guide the particles to run towards the optimal solution. Additionally, we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate. Finally, the proposed SA-MDP approach is performed and verified over several datasets. The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.


menu
Abstract
Full text
Outline
About this article

An Optimized Sanitization Approach for Minable Data Publication

Show Author's information Fan YangXiaofeng Liao( )
College of Computer Science, Chongqing University, Chongqing 400044, China

Abstract

Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities. It prohibits minable data publication since the published data may contain sensitive information. Thus, it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks. To this end, in this paper, we propose an optimized sanitization approach for minable data publication (named as SA-MDP). SA-MDP supports association rules mining function while providing privacy protection for specific rules. In SA-MDP, we consider the trade-off between the data utility and the data privacy in the minable data publication problem. To address this problem, SA-MDP designs a customized particle swarm optimization (PSO) algorithm, where the optimization objective is determined by both the data utility and the data privacy. Specifically, we take advantage of PSO to produce new particles, which is achieved by random mutation or learning from the best particle. Hence, SA-MDP can avoid the solutions being trapped into local optima. Besides, we design a proper fitness function to guide the particles to run towards the optimal solution. Additionally, we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate. Finally, the proposed SA-MDP approach is performed and verified over several datasets. The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.

Keywords:

data publication, data sanitization, association rules hiding, evolutionary algorithm
Received: 10 March 2022 Accepted: 20 March 2022 Published: 09 June 2022 Issue date: September 2022
References(33)
[1]
D. Su, J. Cao, N. Li, and M. Lyu, PrivPfC: Differentially private data publication for classification, VLDB J., vol. 27, no. 2, pp. 201-223, 2018.
[2]
I. Viktoratos, A. Tsadiras, and N. Bassiliades, Combining community-based knowledge with association rule mining to alleviate the cold start problem in context-aware recommender systems, Expert Syst. Appl., vol. 101, pp. 78-90, 2018.
[3]
X. Zheng, G. Luo, and Z. Cai, A fair mechanism for private data publication in online social networks, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 2, pp. 880-891, 2020.
[4]
K. Zhang, Z. Tian, Z. Cai, and D. Seo, Link-privacy preserving graph embedding data publication with adversarial learning, Tsinghua Science and Technology, vol. 27, no. 2, pp. 244-256, 2022.
[5]
F. Yang, X. Lei, J. Le, N. Mu, and X. Liao, Minable data publication based on sensitive association rule hiding, IEEE Trans. Emerg. Top. Comput. Intell., vol. 14, no. 8. pp. 1-11, 2021.
[6]
W. Diffie and M. E. Hellman, Special feature exhaustive cryptanalysis of the NBS data encryption standard, Computer, vol. 10, no. 6, pp. 74-84, 1977.
[7]
X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. Mclachlan, A. Ng, B. Liu, P. S. Yu, et al., Top 10 algorithms in data mining, Knowl. Inf. Syst., vol. 14, no. 1, pp. 1-37, 2008.
[8]
M. Li, H. Wang, and J. Li, Mining conditional functional dependency rules on big data, Big Data Mining and Analytics, vol. 3, no. 1, pp. 68-84, 2020.
[9]
F. N. Motlagh and H. Sajedi, MOSAR: A multi-objective strategy for hiding sensitive association rules using genetic algorithm, Appl. Artif. Intell., vol. 30, no. 9, pp. 823-843, 2016.
[10]
S. L. Wang, B. Parikh, and A. Jafari, Hiding informative association rule sets, Expert Syst. Appl., vol. 33, no. 2, pp. 316-323, 2007.
[11]
B. Talebi and N. M. Dehkordi, Sensitive association rules hiding using electromagnetic field optimization algorithm, Expert Syst. Appl., vol. 114, pp. 155-172, 2018.
[12]
M. H. Afshari, M. N. Dehkordi, and M. Akbari, Association rule hiding using cuckoo optimization algorithm, Expert Syst. Appl., vol. 64, pp. 340-351, 2016.
[13]
H. Pang and B. Wang, Privacy-preserving association rule mining using homomorphic encryption in a multikey environment, IEEE Syst. J., vol. 15, no. 2, pp. 3131-3141, 2021.
[14]
J. Wu, N. Mu, X. Lei, J. Le, and X. Liao, SecEDMO: Enabling efficient data mining with strong privacy protection in cloud computing, IEEE Trans. Cloud Comput., vol. 10, no. 1, pp. 691-705, 2019.
[15]
A. Telikani, A. H. Gandomi, and A. Shahbahrami, A survey of evolutionary computation for association rule mining, Inf. Sci., vol. 524, pp. 318-352, 2020.
[16]
Q. Qin, S. Cheng, Q. Zhang, L. Li, and Y. Shi, Particle swarm optimization with interswarm interactive learning strategy, IEEE Trans. Cybern., vol. 46, no. 10, pp. 2238-2251, 2016.
[17]
SPMF: An Open-Source Data Mining Library, , 2022.
[18]
Z. H. Zhan, S. H. Wu, and J. Zhang, A new evolutionary computation framework for privacy-preserving optimization, in Proc. 2021 13th Int. Conf. on Advanced Computational Intelligence, Wanzhou, China, 2021, pp. 220-226.
[19]
G. M. Fan and H. J. Huang, A novel binary differential evolution algorithm for a class of fuzzy-stochastic resource allocation problems, in Proc. 13th IEEE Int. Conf. on Control and Automation, Ohrid, Macedonia, 2017, pp. 548-553.
[20]
I. Dinur and K. Nissim, Revealing information while preserving privacy, in Proc. of the 22nd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems, San Diego, CA, USA, 2003, pp. 202-210.
[21]
L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, Information security in big data: Privacy and data mining, IEEE Access, vol. 2, pp. 1149-1176, 2014.
[22]
S. Li, N. Mu, J. Le, and X. Liao, Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction, Comput. Secur., vol. 84, pp. 17-34, 2019.
[23]
A. Telikani and A. Shahbahrami, Data sanitization in association rule mining: An analytical review, Expert Syst. Appl., vol. 96, pp. 406-426, 2018.
[24]
Z. Cai, Z. He, X. Guan, and Y. Li, Collective data-sanitization for preventing sensitive information inference attacks in social networks, IEEE Trans. Dependable Secure Comput., vol. 15, no. 4, pp. 577-590, 2018.
[25]
X. Liu, S. Wen, and W. Zuo, Effective sanitization approaches to protect sensitive knowledge in high-utility itemset mining, Appl. Intell., vol. 50, no. 1, pp. 169-191, 2020.
[26]
P. Huang, Y. Wang, K. Wang, and K. Yang, Differential evolution with a variable population size for deployment optimization in a UAV-assisted IoT data collection system, IEEE Trans. Emerg. Top. Comput. Intell., vol. 4, no. 3, pp. 324-335, 2020.
[27]
I. Fister Jr. and I. Fister, Information cartography in association rule mining, IEEE Trans. Emerg. Top. Comput. Intell., .
[28]
A. Khan, M. S. Qureshi, and A. Hussain, Improved genetic algorithm approach for sensitive association rules hiding, World Appl. Sci. J., vol. 31, no. 12, pp. 2087-2092, 2014.
[29]
U. Ahmed, J. C. W. Lin, G. Srivastava, R. Yasin, and Y. Djenouri, An evolutionary model to mine high expected utility patterns from uncertain databases, IEEE Trans. Emerg. Top. Comput. Intell., vol. 5, no. 1, pp. 19-28, 2021.
[30]
Á. M. García-Vico, C. J. Carmona, P. González, and M. J. del Jesus, MOEA-EFEP: Multi-objective evolutionary algorithm for extracting fuzzy emerging patterns, IEEE Trans. Fuzzy Syst., vol. 26, no. 5, pp. 2861-2872, 2018.
[31]
E. V. Altay and B. Alatas, Differential evolution and sine cosine algorithm based novel hybrid multi-objective approaches for numerical association rule mining, Inf. Sci., vol. 554, pp. 198-221, 2021.
[32]
L. Zhang, S. Yang, X. Wu, F. Cheng, Y. Xie, and Z. Lin, An indexed set representation based multi-objective evolutionary approach for mining diversified top-k high utility patterns, Eng. Appl. Artif. Intell., vol. 77, pp. 9-20, 2019.
[33]
P. Cheng, I. Lee, C. W. Lin, and J. S. Pan, Association rule hiding based on evolutionary multi-objective optimization, Intell. Data Anal., vol. 20, no. 3, pp. 495-514, 2016.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 10 March 2022
Accepted: 20 March 2022
Published: 09 June 2022
Issue date: September 2022

Copyright

© The author(s) 2022.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 61932006), in part by National Key R&D Program of China (No. 2018AAA0100101), and in part by Chongqing Technology Innovation and Application Development Project (No. cstc2020jscx-msxmX0156).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return