Publications
Sort:
Open Access Issue
Desensitization of Private Text Dataset Based on Gradient Strategy Trans-WTGAN
Tsinghua Science and Technology 2025, 30(5): 2081-2096
Published: 29 April 2025
Abstract PDF (2.9 MB) Collect
Downloads:116

Privacy-sensitive data encounter immense security and usability challenges in processing, analyzing, and sharing. Meanwhile, traditional privacy data desensitization methods suffer from issues such as poor quality and low usability after desensitization. Therefore, a text data desensitization model that combines Transformer and Wasserstein Text convolutional Generative Adversarial Network (Trans-WTGAN) is proposed. Transformer as the generator and its self-attention mechanism can handle long-range dependencies, enabling the generated of higher-quality text; Text Convolutional Neural Network (TextCNN) integrates the idea of Wasserstein as the discriminator to enhance the stability of model training; and the strategy gradient scheme of reinforcement learning is employed. Reinforcement learning utilizes the policy gradient scheme as the updating method of generator parameters, ensuring the generated data retains the original key features and maintains a certain level of usability. The experimental results indicate that the proposed model scheme holds a greater advantage over existing methods in terms of text quality and structural consistency, can guarantee the desensitization effect, and ensures the usability of the privacy-sensitive data to a certain extent after desensitization, facilitates the simulation of the development environment for the use of real data and the analysis and sharing of data.

Total 1