Inference Attacks on Genomic Data Based on Probabilistic Graphical Models

Zaobo He; Junxiu Zhou

doi:10.26599/BDMA.2020.9020008

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Inference Attacks on Genomic Data Based on Probabilistic Graphical Models

Zaobo He(

), Junxiu Zhou

∙ Department of Computer Science and Software Engineering, Miami University, Oxford, OH 45011, USA.

∙ Department of Computer Science, Northern Kentucky University, Highland Heights, KY 41099, USA.

Show Author Information

Abstract

The rapid progress and plummeting costs of human-genome sequencing enable the availability of large amount of personal biomedical information, leading to one of the most important concerns — genomic data privacy. Since personal biomedical data are highly correlated with relatives, with the increasing availability of genomes and personal traits online (i.e., leakage unwittingly, or after their releasing intentionally to genetic service platforms), kin-genomic data privacy is threatened. We propose new inference attacks to predict unknown Single Nucleotide Polymorphisms (SNPs) and human traits of individuals in a familial genomic dataset based on probabilistic graphical models and belief propagation. With this method, the adversary can predict the unobserved genomes or traits of targeted individuals in a family genomic dataset where some individuals’ genomes and traits are observed, relying on SNP-trait association from Genome-Wide Association Study (GWAS), Mendel’s Laws, and statistical relations between SNPs. Existing genome inferences have relatively high computational complexity with the input of tens of millions of SNPs and human traits. Then, we propose an approach to publish genomic data with differential privacy guarantee. After finding an approximate distribution of the input genomic dataset relying on Bayesian networks, a noisy distribution is obtained after injecting noise into the approximate distribution. Finally, synthetic genomic dataset is sampled and it is proved that any query on synthetic dataset satisfies differential privacy guarantee.

Keywords

Single Nucleotide Polymorphism (SNP)-trait association belief propagation factor graph data sanitization

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 3 Issue 3,
September 2020

Pages 225-233

DOI: 10.26599/BDMA.2020.9020008

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

He Z, Zhou J. Inference Attacks on Genomic Data Based on Probabilistic Graphical Models. Big Data Mining and Analytics, 2020, 3(3): 225-233. https://doi.org/10.26599/BDMA.2020.9020008

1614

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 10 May 2020

Accepted: 24 June 2020

Published: 16 July 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).