AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Desensitization of Private Text Dataset Based on Gradient Strategy Trans-WTGAN

School of Cyberspace Security and Key Laboratory of Internet Information Retrieval of Hainan Province, Hainan University, Haikou 570100, China
Show Author Information

Abstract

Privacy-sensitive data encounter immense security and usability challenges in processing, analyzing, and sharing. Meanwhile, traditional privacy data desensitization methods suffer from issues such as poor quality and low usability after desensitization. Therefore, a text data desensitization model that combines Transformer and Wasserstein Text convolutional Generative Adversarial Network (Trans-WTGAN) is proposed. Transformer as the generator and its self-attention mechanism can handle long-range dependencies, enabling the generated of higher-quality text; Text Convolutional Neural Network (TextCNN) integrates the idea of Wasserstein as the discriminator to enhance the stability of model training; and the strategy gradient scheme of reinforcement learning is employed. Reinforcement learning utilizes the policy gradient scheme as the updating method of generator parameters, ensuring the generated data retains the original key features and maintains a certain level of usability. The experimental results indicate that the proposed model scheme holds a greater advantage over existing methods in terms of text quality and structural consistency, can guarantee the desensitization effect, and ensures the usability of the privacy-sensitive data to a certain extent after desensitization, facilitates the simulation of the development environment for the use of real data and the analysis and sharing of data.

References

[1]
A. Joshi, A. Raturi, S. Kumar, A. Dumka, and D. P. Singh, Improved security and privacy in cloud data security and privacy: Measures and attacks, in Proc. 2022 Int. Conf. Fourth Industrial Revolution Based Technology and Practices (ICFIRTP), Uttarakhand, India, 2022, pp. 230–233.
[2]

P. Huang, L. Guo, and Y. Zhong, Efficient algorithms for maximizing group influence in social networks, Tsinghua Science and Technology, vol. 27, no. 5, pp. 832–842, 2022.

[3]
R. Josphineleela, S. Kaliappan, L. Natrayan, and A. Garg, Big data security through privacy-preserving data mining (PPDM): A decentralization approach, in Proc. 2023 2nd Int. Conf. Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2023, pp. 718–721.
[4]
Z. Wang, K. Wei, C. Jiang, J. Tian, M. Zhong, Y. Liu, and Y. Liu, Research on productization and development trend of data desensitization technology, in Proc. 2021 IEEE 20th Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), Shenyang, China, 2021, pp. 1564–1569.
[5]
H. Li, Research on big data analysis data acquisition and data analysis, in Proc. 2021 Int. Conf. Artificial Intelligence, Big Data and Algorithms (CAIBDA), Xi’an, China, 2021, pp. 162–165.
[6]

Y. Cao, N. Xu, H. Wang, X. Zhao, and A. M. Ahmad, Neural networks-based adaptive tracking control for full-state constrained switched nonlinear systems with periodic disturbances and actuator saturation, Int. J. Syst. Sci., vol. 54, no. 14, pp. 2689–2704, 2023.

[7]

K. Li, L. Tian, X. Zheng, and B. Hui, Plausible heterogeneous graph k-anonymization for social networks, Tsinghua Science and Technology, vol. 27, no. 6, pp. 912–924, 2022.

[8]
F. Ashkouti, K. Khamforoosh, and A. Sheikhahmadi, DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark, Inf. Sci., vol. 546, pp. 1–24, 2021.
[9]

Y. M. Wen, X. Liu, and H. Yu, Adaptive tree-like neural network: Overcoming catastrophic forgetting to classify streaming data with concept drifts, Knowledge-Based Syst., vol. 293, p. 111636, 2024.

[10]
H. Yu, A. Liu, B. Wang, R. Li, G. Zhang, and J. Lu, Real-time decision making for train carriage load prediction via multi-stream learning, in Proc. AI 2020: Advances in Artificial Intelligence: 33rd Australasian Joint Conf., Canberra, Australia, 2020, pp. 29–41.
[11]

W. Mahanan, W. A. Chaovalitwongse, and J. Natwichai, Data privacy preservation algorithm with k-anonymity, World Wide Web, vol. 24, no. 5, pp. 1551–1561, 2021.

[12]

P. Wang, H. Yu, N. Jin, D. Davies, and W. L. Woo, QuadCDD: A quadruple-based approach for understanding concept drift in data streams, Expert Syst. Appl., vol. 238, p. 122114, 2024.

[13]
J. Zhu, S. Sun, and X. Zhou, SPARK-X: Non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies, Genome Biol., vol. 22, no. 1, p. 184, 2021.
[14]

H. Che, B. Pan, M. F. Leung, Y. Cao, and Z. Yan, Tensor factorization with sparse and graph regularization for fake news detection on social networks, IEEE Trans. Computat. Soc. Syst., vol. 11, no. 4, pp. 4888–4898, 2024.

[15]

J. Li, H. Yu, Z. Zhang, X. Luo, and S. Xie, Concept drift adaptation by exploiting drift type, ACM Trans. Knowledge Discov. Data, vol. 18, no. 4, p. 96, 2024.

[16]

H. Yu, J. Li, J. Lu, Y. Song, S. Xie, and G. Zhang, Type-LDD: A type-driven lite concept drift detector for data streams, IEEE Trans. Knowledge Data Eng., vol. 36, no. 12, pp. 9476–9489, 2024.

[17]
N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y. Kim, Data synthesis based on generative adversarial networks, arXiv preprint arXiv: 1806.03384, 2018.
[18]
J. Dou, S. Qie, J. Lu, and Y. Ren, Research on data generation model based on improved SeqGAN, in Proc. 2021 10th Int. Conf. Software and Computer Applications, Kuala Lumpur, Malaysia, 2021, pp. 45–50.
[19]

A. Torfi, E. A. Fox, and C. K. Reddy, Differentially private synthetic medical data generation using convolutional GANs, Inf. Sci., vol. 586, pp. 485–500, 2022.

[20]

A. S. Imran, R. Yang, Z. Kastrati, S. M. Daudpota, and S. Shaikh, The impact of synthetic text generation for sentiment analysis using GAN based models, Egypt. Inform. J., vol. 23, no. 3, pp. 547–557, 2022.

[21]
Z. Jiao and F. Ren, WRGAN: Improvement of RelGAN with Wasserstein loss for text generation, Electronics, vol. 10, no. 3, p. 275, 2021.
[22]
Z. Liu, J. Wang, and Z. Liang, CatGAN: Category-aware generative adversarial networks with hierarchical evolutionary learning for category text generation, in Proc. 34th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 8425–8432.
[23]

C. Dewi, R. C. Chen, Y. T. Liu, and S. K. Tai, Synthetic data generation using DCGAN for improved traffic sign recognition, Neural Comput. Appl., vol. 34, no. 24, pp. 21465–21480, 2022.

[24]

H. Zhang, H. Song, S. Li, M. Zhou, and D. Song, A survey of controllable text generation using transformer-based pre-trained language models, ACM Comput. Surv., vol. 56, no. 3, p. 64, 2023.

[25]
S. Welleck and K. Cho, MLE-guided parameter search for task loss minimization in neural sequence modeling, in Proc. 35th AAAI Conf. Artificial Intelligence, Virtual Event, 2021, pp. 14032–14040.
[26]

Y. Zhang, X. X. Lü, Y. C. Zou, and Y. G. Li, Differentially private sequence generative adversarial networks for data privacy masking, (in Chinese), Chinese Journal of Network and Information Security, vol. 6, no. 4, pp. 109–119, 2020.

[27]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2017.
[28]
J. Wu, Z. Huang, J. Thoma, D. Acharya, and L. Van Gool, Wasserstein divergence for GANs, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 673–688.
[29]
T. Kurbiel and S. Khaleghian, Training of deep neural networks based on distance measures using RMSProp, arXiv preprint arXiv:1708.01911, 2017.
[30]

W. J. Jin, Z. Bu, and B. Y. Qin, Intelligent fuzzy testing method based on sequence generative adversarial networks, (in Chinese), Journal of Information Security Research, vol. 10, no. 6, pp. 490–497, 2024.

[31]

G. Liu, X. Sun, Y. Li, H. Li, S. Zhao, and Z. Guo, An automatic privacy-aware framework for text data in online social network based on a multi-deep learning model, Int. J. Intell. Syst., vol. 2023, p. 1727285, 2023.

[32]
D. Ma, Y. Wang, J. Ma, and Q. Jin, SGNR: A social graph neural network based interactive recommendation scheme for E-commerce, Tsinghua Science and Technology, vol. 28, no. 4, pp. 786–798, 2023.
Tsinghua Science and Technology
Pages 2081-2096
Cite this article:
Guo Z, Zhou Y, Ye J, et al. Desensitization of Private Text Dataset Based on Gradient Strategy Trans-WTGAN. Tsinghua Science and Technology, 2025, 30(5): 2081-2096. https://doi.org/10.26599/TST.2024.9010155

447

Views

116

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 17 May 2024
Revised: 30 June 2024
Accepted: 26 August 2024
Published: 29 April 2025
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return