AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Privacy Policy Text Compliance Reasoning Framework with Large Language Models for Healthcare Services

College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
School of Software Technology, Zhejiang University, Ningbo 315048, China
Binjiang Institute of Zhejiang University, Hangzhou 310052, China
Show Author Information

Abstract

The advancement of artificial intelligence-generated content drives the diversification of healthcare services, resulting in increased private information collection by healthcare service providers. Therefore, compliance with privacy regulations has increasingly become a paramount concern for both regulatory authorities and consumers. Privacy policies are crucial for consumers to understand how their personal information is collected, stored, and processed. In this work, we propose a privacy policy text compliance reasoning framework called FACTOR, which harnesses the power of large language models (LLMs). Since the General Data Protection Regulation (GDPR) has broad applicability, this work selects Article 13 of the GDPR as regulation requirements. FACTOR segments the privacy policy text using a sliding window strategy and employs LLM-based text entailment to assess compliance for each segment. The framework then applies a rule-based ensemble approach to aggregate the entailment results for all regulation requirements from the GDPR. Our experiments on a synthetic corpus of 388 privacy policies demonstrate the effectiveness of FACTOR. Additionally, we analyze 100 randomly selected websites offering healthcare services, revealing that nine of them lack a privacy policy altogether, while 29 have privacy policy texts that fail to meet the regulation requirements.

References

[1]
X. Zhou, X. Zheng, T. Shu, W. Liang, K. I. K. Wang, L. Qi, S. Shimizu, and Q. Jin, Information theoretic learning-enhanced dual-generative adversarial networks with causal representation for robust OOD generalization, IEEE Trans. Neural Netw. Learn. Syst., DOI: 10.1109/TNNLS.2023.3330864.
[2]

P. Maji, H. K. Mondal, A. P. Roy, S. Poddar, and S. P. Mohanty, iKardo: An intelligent ECG device for automatic critical beat identification for smart healthcare, IEEE Trans. Consum. Electr., vol. 67, no. 4, pp. 235–243, 2021.

[3]

P. A. Bonatti, L. Ioffredo, I. M. Petrova, L. Sauro, and I. R. Siahaan, Real-time reasoning in OWL2 for GDPR compliance, Artif. Intell., vol. 289, p. 103389, 2020.

[4]

X. Zhou, W. Liang, K. Yan, W. Li, K. I. K. Wang, J. Ma, and Q. Jin, Edge-enabled two-stage scheduling based on deep reinforcement learning for internet of everything, IEEE Internet Things J., vol. 10, no. 4, pp. 3295–3304, 2023.

[5]

L. Kong, G. Li, W. Rafique, S. Shen, Q. He, M. R. Khosravi, R. Wang, and L. Qi, Time-aware missing healthcare data prediction based on ARIMA model, IEEE/ACM Trans. Computat. Biol. Bioinform., vol. 21, no. 4, pp. 1042–1050, 2024.

[6]
J. Leicht, M. Heisel, and A. Gerl, PriPoCoG: Guiding policy authors to define GDPR-compliant privacy policies, in Proc. 19 th Int. Conf. Trust, Privacy and Security in Digital Business, Vienna, Austria, 2022, pp. 1–16.
[7]

X. Zhou, X. Ye, K. I. K. Wang, W. Liang, N. K. C. Nair, S. Shimizu, Z. Yan, and Q. Jin, Hierarchical federated learning with social context clustering-based participant selection for internet of medical things applications, IEEE Trans. Comput. Soc. Syst., vol. 10, no. 4, pp. 1742–1751, 2023.

[8]
A. Bowyer, J. Holt, J. G. Jefferies, R. Wilson, D. Kirk, and J. D. Smeddinck, Human-GDPR interaction: Practical experiences of accessing personal data, in Proc. 2022 CHI Conf. Human Factors in Computing Systems, New Orleans, LA, USA, 2022, p. 106.
[9]

K. Huckvale, J. T. Prieto, M. Tilney, P. J. Benghozi, and J. Car, Unaddressed privacy risks in accredited health and wellness apps: A cross-sectional systematic assessment, BMC Med., vol. 13, no. 1, pp. 214, 2015.

[10]

A. M. McDonald and L. F. Cranor, The cost of reading privacy policies, I/S: A Journal of Law and Policy for the Information Society, vol. 4, pp. 543, 2008.

[11]
B. Fabian, T. Ermakova, and T. Lentz, Large-scale readability analysis of privacy policies, in Proc. Int. Conf. Web Intelligence, Leipzig, Germany, 2017, pp. 18–25.
[12]
H. Harkous, K. Fawaz, R. Lebret, F. Schaub, K. G. Shin, and K. Aberer, Polisis: Automated analysis and presentation of privacy policies using deep learning, in Proc. 27 th USENIX Conf. Security Symp., Baltimore, MD, USA, 2018, pp. 531–548.
[13]

P. Jain, M. Gyanchandani, and N. Khare, Big data privacy: A technological perspective and review, J. Big Data, vol. 3, no. 1, pp. 25, 2016.

[14]
A. Gerl, Modelling of a privacy language and efficient policy-based de-identification, PhD dissertation, Université de Lyon, Lyon, France, Universität Passau (Allemagne), Allemagne, Germany, 2019.
[15]
C. Tang, Z. Liu, C. Ma, Z. Wu, Y. Li, W. Liu, D. Zhu, Q. Li, X. Li, T. Liu, et al., PolicyGPT: Automated analysis of privacy policies with large language models, arXiv preprint arXiv: 2309.10238, 2023.
[16]
R. Amos, G. Acar, E. Lucherini, M. Kshirsagar, A. Narayanan, and J. Mayer, Privacy policies over time: Curation and analysis of a million-document dataset, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 2165–2176.
[17]

T. Linden, R. Khandelwal, H. Harkous, and K. Fawaz, The privacy policy landscape after the GDPR, Proc. Privacy Enhancing Technol., vol. 2020, no. 1, pp. 47–64, 2020.

[18]

M. Degeling, C. Utz, C. Lentzsch, H. Hosseini, F. Schaub, and T. Holz, We value your privacy .. now take some cookies: Measuring the GDPR’S impact on web privacy, arXiv preprint arXiv:1808.05096v4, 2019.

[19]
N. Bateni, J. Kaur, R. Dara, and F. Song, Content analysis of privacy policies before and after GDPR, in Proc. 2022 19 th Annual Int. Conf. Privacy, Security & Trust (PST ), Fredericton, Canada, 2022, pp. 1–9.
[20]

A. Tauqeer, A. Kurteva, T. R. Chhetri, A. Ahmeti, and A. Fensel, Automated GDPR contract compliance verification using knowledge graphs, Information, vol. 13, no. 10, p. 447, 2022.

[21]

L. Qi, X. Xu, X. Wu, Q. Ni, Y. Yuan, and X. Zhang, Digital-twin-enabled 6G mobile network video streaming using mobile crowdsourcing, IEEE J. Select. Areas Commun., vol. 41, no. 10, pp. 3161–3174, 2023.

[22]
A. Ravichander, A. W. Black, T. Norton, S. Wilson, and N. Sadeh, Breaking down walls of text: How can NLP benefit consumer privacy? in Proc. 59 th Annu. Meeting of the Association for Computational Linguistics and the 11 th Int. Joint Conf. Natural Language Processing, Virtual Event, 2021, pp. 4125–4140.
[23]
S. Gupta, G. Gopi, H. Balaji, E. Poplavska, N. O’Toole, S. Arora, T. Norton, N. Sadeh, and S. Wilson, Creation and analysis of an international corpus of privacy laws, in Proc. 2024 Joint Int. Conf. Computational Linguistics, Language Resources and Evaluation, Torino, Italy, 2024, pp. 4092–4105.
[24]
S. Liu, B. Zhao, R. Guo, G. Meng, F. Zhang, and M. Zhang, Have you been properly notified? Automatic compliance analysis of privacy policy text with GDPR article 13, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 2154–2164.
[25]
L. Elluri, S. S. L. Chukkapalli, K. P. Joshi, T. Finin, and A. Joshi, A BERT based approach to measure web services policies compliance with GDPR, IEEE Access, vol. 9, pp. 148004–148016, 2021.
[26]

F. Wang, L. Wang, G. Li, Y. Wang, C. Lv, and L. Qi, Edge-cloud-enabled matrix factorization for diversified APIs recommendation in mashup creation, World Wide Web, vol. 25, no. 5, pp. 1809–1829, 2022.

[27]

X. Zhou, W. Liang, K. I. K. Wang, Z. Yan, L. T. Yang, W. Wei, J. Ma, and Q. Jin, Decentralized P2P federated learning for privacy-preserving and resilient mobile robotic systems, IEEE Wirel. Commun., vol. 30, no. 2, pp. 82–89, 2023.

[28]

X. Zhou, X. Zheng, X. Cui, J. Shi, W. Liang, Z. Yan, L. T. Yang, S. Shimizu, and K. I. K. Wang, Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks, IEEE J. Select. Areas Commun., vol. 41, no. 10, pp. 3191–3211, 2023.

[29]

Mamta, B. B. Gupta, K. C. Li, V. C. M. Leung, K. E. Psannis, S. Yamaguchi, Blockchain-assisted secure fine-grained searchable encryption for a cloud-based healthcare cyber-physical system, IEEE/CAA J. Autom. Sin., vol. 8, no. 12, pp. 1877–1890, 2021.

[30]

G. N. Nguyen, N. H. Le Viet, M. Elhoseny, K. Shankar, B. B. Gupta, and A. A. Abd El-Latif, Secure blockchain enabled cyber–physical systems in healthcare using deep belief network with ResNet model, J. Parallel Distrib. Comput., vol. 153, pp. 150–160, 2021.

[31]

A. Raj and S. Prakash, A privacy-preserving authentic healthcare monitoring system using blockchain, Int. J. Softw Sci. Computat. Intell., vol. 14, no. 1, pp. 1–23, 2022.

[32]
R. Ramanath, F. Liu, N. Sadeh, and N. A. Smith, Unsupervised alignment of privacy policies using hidden Markov models, in Proc. 52 nd Annu. Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 2014, pp. 605–610.
[33]
K. Zhao, L. Yu, S. Zhou, J. Li, X. Luo, Y. F. A. Chiu, and Y. Liu, A fine-grained Chinese software privacy policy dataset for sequence labeling and regulation compliant identification, in Proc. 2022 Conf. Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022, pp. 10266–10277.
[34]
W. Ahmad, J. Chi, T. Le, T. Norton, Y. Tian, and K. W. Chang, Intent classification and slot filling for privacy policies, in Proc. 59 th Annu. Meeting of the Association for Computational Linguistics and the 11 th Int. Joint Conf. Natural Language Processing, Virtual Event, 2021, pp. 4402–4417.
[35]
S. Zimmeck, P. Story, D. Smullen, A. Ravichander, Z. Wang, J. Reidenberg, N. C. Russell, and N. Sadeh, MAPS: Scaling privacy compliance analysis to a million apps, Proc. Priv. Enhancing Tech., vol. 2019, no. 3, p. 66–86, 2019.
[36]
S. Wilson, F. Schaub, A. A. Dara, F. Liu, S. Cherivirala, P. G. Leon, M. S. Andersen, S. Zimmeck, K. M. Sathyendra, N. C. Russell, et al., The creation and analysis of a website privacy policy corpus, in Proc. 54 th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1330–1340.
[37]
E. Poplavska, T. B. Norton, S. Wilson, and N. Sadeh, From prescription to description: Mapping the GDPR to a privacy policy corpus annotation scheme, in Proc. Legal Knowledge and Information Systems-JURIX 2020 : 33 rd Annu. Conf., Amsterdam, the Netherlands, 2020, pp. 243–246.
[38]
H. Al-Khalifa, M. Mashaabi, G. Al-Yahya, and R. Alnashwan, The Saudi privacy policy dataset, arXiv preprint arXiv: 2304.02757, 2023.
[39]
M. Srinath, S. Wilson, and C. L. Giles, Privacy at scale: Introducing the PrivaSeer corpus of web privacy policies, in Proc. 59 th Annu. Meeting of the Association for Computational Linguistics and the 11 th Int. Joint Conf. Natural Language Processing, Virtual Event, 2021, pp. 6829–6839.
[40]
L. Lebanoff and F. Liu, Automatic detection of vague words and sentences in privacy policies, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3508–3517.
[41]
S. Arora, H. Hosseini, C. Utz, V. B. Kumar, T. Dhellemmes, A. Ravichander, P. Story, J. Mangat, R. Chen, M. Degeling, et al., A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus, in Proc. 13 th Language Resources and Evaluation Conf., Marseille, France, 2022, pp. 5460–5472.
[42]
A. Ravichander, A. W. Black, S. Wilson, T. Norton, and N. Sadeh, Question answering for privacy policies: Combining computational and legal perspectives, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9 th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 4947–4958.
[43]

M. Barati and O. Rana, Tracking GDPR compliance in cloud-based service delivery, IEEE Trans. Serv. Comput., vol. 15, no. 3, pp. 1498–1511, 2022.

[44]
N. B. Truong, K. Sun, G. M. Lee, and Y. Guo, GDPR-compliant personal data management: A blockchain-based solution, IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 1746–1761, 2020.
[45]

L. Campanile, M. Iacono, F. Marulli, and M. Mastroianni, Designing a GDPR compliant blockchain-based IoV distributed information tracking system, Inf. Process. Manag., vol. 58, no. 3, p. 102511, 2021.

[46]
H. Ahmad and G. S. Aujla, GDPR compliance verification through a user-centric blockchain approach in multi-cloud environment, Comput. Electr. Eng., vol. 109, p. 108747, 2023.
[47]
D. Torre, S. Abualhaija, M. Sabetzadeh, L. Briand, K. Baetens, P. Goes, and S. Forastier, An AI-assisted approach for checking the completeness of privacy policies against GDPR, in Proc. 2020 IEEE 28 th Int. Requirements Engineering Conf. (RE ). Zurich, Switzerland, 2020, pp. 136–146.
[48]
S. D. Gupta and T. Hahmann, OPPO: An ontology for describing fine-grained data practices in privacy policies of online social networks, arXiv preprint arXiv: 2309.15971, 2023.
[49]
H. Cui, R. Trimananda, A. Markopoulou, and S. Jordan, POLIGRAPH: Automated privacy policy analysis using knowledge graphs, in Proc. 32 nd USENIX Conf. Security Symp., Anaheim, CA, USA, 2023, pp. 1037–1054.
[50]

S. Tokas, O. Owe, and T. Ramezanifarkhani, Static checking of GDPR-related privacy compliance for object-oriented distributed systems, J. Logical Algebraic Methods Programm., vol. 125, p. 100733, 2022.

[51]
K. Hjerppe, J. Ruohonen, and V. Leppänen, Annotation-based static analysis for personal data protection, in Proc. 14 th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2.2 International Summer School on Privacy and Identity Management. Data for Better Living : AI and Privacy, Windisch, Switzerland, 2020, pp. 343–358.
[52]
Y. Ling, K. Wang, G. Bai, H. Wang, and J. S. Dong, Are they toeing the line? Diagnosing privacy compliance violations among browser extensions, in Proc. 37 th IEEE/ACM Int. Conf. Automated Software Engineering, Rochester, MI, USA, 2023, p. 10.
[53]
S. Liu, F. Zhang, B. Zhao, R. Guo, T. Chen, and M. Zhang, APPCorp: A corpus for android privacy policy document structure analysis, Front. Comput. Sci., vol. 17, no. 3, p. 173320, 2023.
[54]
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, RoBERTa: A robustly optimized BERT pretraining approach, arXiv preprint arXiv: 1907.11692, 2019.
[55]

F. Wang, H. Zhu, G. Srivastava, S. Li, M. R. Khosravi, and L. Qi, Robust collaborative filtering recommendation with user-item-trust records, IEEE Trans. Comput. Soc. Syst., vol. 9, no. 4, pp. 986–996, 2022.

[56]

X. Zhou, W. Liang, K. I. K. Wang, and L. T. Yang, Deep correlation mining based on hierarchical hybrid networks for heterogeneous big data recommendations, IEEE Trans. Comput. Soc. Syst., vol. 8, no. 1, pp. 171–178, 2021.

[57]

L. Qi, W. Lin, X. Zhang, W. Dou, X. Xu, and J. Chen, A correlation graph based approach for personalized and compatible web APIs recommendation in mobile app development, IEEE Trans. Knowl. Data Eng., vol. 35, no. 6, pp. 5444–5457, 2023.

Tsinghua Science and Technology
Pages 1831-1845
Cite this article:
Chen J, Wang F, Pang S, et al. A Privacy Policy Text Compliance Reasoning Framework with Large Language Models for Healthcare Services. Tsinghua Science and Technology, 2025, 30(4): 1831-1845. https://doi.org/10.26599/TST.2024.9010089

54

Views

3

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 05 March 2024
Revised: 01 May 2024
Accepted: 08 May 2024
Published: 03 March 2025
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return