Abstract
Data synthesis under Local Differential Privacy (LDP) presents a promising approach for private data analysis and sharing, as it enables the execution of all analysis tasks on raw data without the need for a trusted aggregator. The select-measure-generate paradigm of data synthesis under Differential Privacy (DP) introduces specific challenges in the context of LDP, particularly because the noise inherent to LDP is significantly greater than that of DP, especially in high-dimensional datasets. The “select” step involves calculating the correlations between attributes to identify important marginal measurements (attribute pairs), while the “measure” step aims to estimate the frequency distribution of each selected marginal under LDP. However, the utility of both the correlation and frequency estimation for multidimensional data is often unsatisfactory under LDP, as the utility of data analysis tasks typically declines with an increasing number of dimensions. To address these issues, we propose a two-stage method, named FilterLDPSyn. In Stage 1, it filters out ineffective measurements based on one-dimensional frequency and entropy estimations under LDP. In Stage 2, it enhances the utility of the distribution by iteratively collecting two-dimensional values and restoring consistency between one- and two-dimensional distributions. Experimental results demonstrate the superiority of our proposed method over existing approaches.
京公网安备11010802044758号
Comments on this article