Scholar - SciOpen

Many human diseases involve multiple genes in complex interactions. Large Genome-Wide Association Studies (GWASs) have been considered to hold promise for unraveling such interactions. However, statistic tests for high-order epistatic interactions ( $⩾ 2$ Single Nucleotide Polymorphisms (SNPs)) raise enormous computational and analytical challenges. It is well known that the block-wise structure exists in the human genome due to Linkage Disequilibrium (LD) between adjacent SNPs. In this paper, we propose a novel Bayesian method, named BAM, for simultaneously partitioning SNPs into LD-blocks and detecting genome-wide multi-locus epistatic interactions that are associated with multiple diseases. Experimental results on the simulated datasets demonstrate that BAM is powerful and efficient. We also applied BAM on two GWAS datasets from WTCCC, i.e., Rheumatoid Arthritis and Type 1 Diabetes, and accurately recovered the LD-block structure. Therefore, we believe that BAM is suitable and efficient for the full-scale analysis of multi-disease-related interactions in GWASs.

Open Access Issue

Bayesian Analysis of Complex Mutations in HBV, HCV, and HIV Studies

Bing Liu, Shishi Feng, Xuan Guo, Jing Zhang

Big Data Mining and Analytics 2019, 2(3): 145-158

Published: 04 April 2019

Abstract

PDF (2.9 MB) Collect Collected

Downloads：64

In this article, we aim to provide a thorough review of the Bayesian-inference-based methods applied to Hepatitis B Virus (HBV), Hepatitis C Virus (HCV), and Human Immunodeficiency Virus (HIV) studies with a focus on the detection of the viral mutations and various problems which are correlated to these mutations. It is particularly difficult to detect and interpret these interacting mutation patterns, but by using Bayesian statistical modeling, it provides a groundbreaking opportunity to solve these problems. Here we summarize Bayesian-based statistical approaches, including the Bayesian Variable Partition (BVP) model, Bayesian Network (BN), and the Recursive Model Selection (RMS) procedure, which are designed to detect the mutations and to make further inferences to the comprehensive dependence structure among the interactions. BVP, BN, and RMS in which Markov Chain Monte Carlo (MCMC) methods are used have been widely applied in HBV, HCV, and HIV studies in the recent years. We also provide a summary of the Bayesian methods’ applications toward these viruses’ studies, where several important and useful results have been discovered. We envisage the applications of more modified Bayesian methods to other infectious diseases and cancer cells that will be following with critical medical results before long.

Open Access Issue

A Survey of SNP Data Analysis

Xiaojun Ding, Xuan Guo

Big Data Mining and Analytics 2018, 1(3): 173-190

Published: 24 May 2018

Abstract

PDF (1.1 MB) Collect Collected

Downloads：125

Every person differs from every other person regarding their physical appearance, susceptibility to disease, response to medications, and so on. However, 99.9 percent of human DNA is the same. As such, differences in human genomes are very worthy of study. Single-Nucleotide Polymorphisms (SNPs) are the simplest form and most common source of genetic polymorphism. SNPs have been used to successfully identify defective genes that cause Mendelian diseases. However, most common human diseases are complex and are caused by multiple SNPs. Each SNP explains only a small fraction of genetic causes. Experiments on individual SNPs may reveal their non-detectable effects on complex diseases. Pathogenesis is a complicated topic, and it is difficult to correctly predict multiple SNPs. As such, the analysis of SNP data is a critical task in the study of genetic diseases. In this paper, we divide the methods for genome-wide SNP data analysis into two categories: single-trait Genome-Wide Association Studies (GWAS) in which pathology is mined from data of a single phenotype, and multiple-trait GWAS which identifies cross-phenotype associations. For single-trait GWAS, we review methods ranging from the simple to the complex, including TEAM, BOOST, AntEpiSeeker, SNPRuler, EDCF, HiSeeker, ORF, MLR-tagging, MSCD, and MIC. For multiple-trait GWAS, we describe methods in terms of their employed regression models, dimension-reduction methods, and meta-analysis methods. We also list the advantages and disadvantages of these methods. Finally, we discuss the future directions of SNP data analysis for genome-wide association.

Total 3