Journal Home > Volume 19 , Issue 6

Genome-Wide Association Studies (GWASs) aim to identify genetic variants that are associated with disease by assaying and analyzing hundreds of thousands of Single Nucleotide Polymorphisms (SNPs). Although traditional single-locus statistical approaches have been standardized and led to many interesting findings, a substantial number of recent GWASs indicate that for most disorders, the individual SNPs explain only a small fraction of the genetic causes. Consequently, exploring multi-SNPs interactions in the hope of discovering more significant associations has attracted more attentions. Due to the huge search space for complicated multi-locus interactions, many fast and effective methods have recently been proposed for detecting disease-associated epistatic interactions using GWAS data. In this paper, we provide a critical review and comparison of eight popular methods, i.e., BOOST, TEAM, epiForest, EDCF, SNPHarvester, epiMODE, MECPM, and MIC, which are used for detecting gene-gene interactions among genetic loci. In views of the assumption model on the data and searching strategies, we divide the methods into seven categories. Moreover, the evaluation methodologies, including detecting powers, disease models for simulation, resources of real GWAS data, and the control of false discover rate, are elaborated as references for new approach developers. At the end of the paper, we summarize the methods and discuss the future directions in genome-wide association studies for detecting epistatic interactions.


menu
Abstract
Full text
Outline
About this article

Genome-Wide Interaction-Based Association of Human Diseases — A Survey

Show Author's information Xuan GuoNing YuFeng GuXiaojun DingJianxin Wang( )Yi Pan( )
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.
Department of Computer Science, College of Staten Island, Staten Island, NY 10314, USA.
Central South University, Changsha 410083, China.

Abstract

Genome-Wide Association Studies (GWASs) aim to identify genetic variants that are associated with disease by assaying and analyzing hundreds of thousands of Single Nucleotide Polymorphisms (SNPs). Although traditional single-locus statistical approaches have been standardized and led to many interesting findings, a substantial number of recent GWASs indicate that for most disorders, the individual SNPs explain only a small fraction of the genetic causes. Consequently, exploring multi-SNPs interactions in the hope of discovering more significant associations has attracted more attentions. Due to the huge search space for complicated multi-locus interactions, many fast and effective methods have recently been proposed for detecting disease-associated epistatic interactions using GWAS data. In this paper, we provide a critical review and comparison of eight popular methods, i.e., BOOST, TEAM, epiForest, EDCF, SNPHarvester, epiMODE, MECPM, and MIC, which are used for detecting gene-gene interactions among genetic loci. In views of the assumption model on the data and searching strategies, we divide the methods into seven categories. Moreover, the evaluation methodologies, including detecting powers, disease models for simulation, resources of real GWAS data, and the control of false discover rate, are elaborated as references for new approach developers. At the end of the paper, we summarize the methods and discuss the future directions in genome-wide association studies for detecting epistatic interactions.

Keywords: epistasis, Single Nucleotide Polymorphism (SNP), genome-wide association, epistatic interaction, complex disease

References(85)

[1]
W. S. Bush and J. H. Moore, Genome-wide association studies, PLoS Computational Biology, vol. 8, no. 12, p. e1002822, 2012.
[2]
J. L. Haines, M. A. Hauser, S. Schmidt, W. K. Scott, L. M. Olson, P. Gallins, K. L. Spencer, S. Y. Kwan, M. Noureddine, J. R. Gilbert, et al., Complement factor h variant increases the risk of age-related macular degeneration, Science, vol. 308, no. 5720, pp. 419-421, 2005.
[3]
G. M. Cooper, J. A. Johnson, T. Y. Langaee, H. Feng, I. B. Stanaway, U. I. Schwarz, M. D. Ritchie, C. M. Stein, D. M. Roden, J. D. Smith, et al., A genome-wide scan for common genetic variants with a large influence on warfarin maintenance dose, Blood, vol. 112, no. 4, pp. 1022-1027, 2008.
[4]
M. Fareed and M. Afzal, Single nucleotide polymorphism in genome-wide association of human population: A tool for broad spectrum service, Egyptian Journal of Medical Human Genetics, vol. 14, no. 2, pp. 123-134, 2013.
[5]
The 1000 Genomes Project Consortium, A map of human genome variation from population-scale sequencing, Nature, vol. 467, no. 7319, pp. 1061-1073, 2010.
[6]
K. Christensen and J. C. Murray, What genomewide association studies can do for medicine, N. Engl. J. Med., vol. 356, no. 11, pp. 1094-1097, 2007.
[7]
A. K. Daly, Genome-wide association studies in pharmacogenomics, Nature Reviews Genetics, vol. 11, no. 4, pp. 241-246, 2010.
[8]
O. L. Griffith, S. B. Montgomery, B. Bernier, B. Chu, K. Kasaian, S. Aerts, S. Mahony, M. C. Sleumer, M. Bilenky, M. Haeussler, et al., Oreganno: An open-access community-driven resource for regulatory annotation, Nucleic Acids Research, vol. 36, no. suppl 1, pp. D107-D113, 2008.
[9]
S. Mooney, Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis, Briefings in Bioinformatics, vol. 6, no. 1, pp. 44-56, 2005.
[10]
J. K. DiStefano and D. M. Taverna, Technological issues and experimental design of gene association studies, in Disease Gene Identification. Springer, 2011, pp. 3-16.
DOI
[11]
M. I. McCarthy, G. R. Abecasis, L. R. Cardon, D. B. Goldstein, J. Little, J. P. Ioannidis, and J. N. Hirschhorn, Genome-wide association studies for complex traits: Consensus, uncertainty and challenges, Nature Reviews Genetics, vol. 9, no. 5, pp. 356-369, 2008.
[12]
Q. He and D.-Y. Lin, A variable selection method for genome-wide association studies, Bioinformatics, vol. 27, no. 1, pp. 1-8, 2011.
[13]
J. Marchini, P. Donnelly, and L. R. Cardon, Genome-wide strategies for detecting multiple loci that influence complex diseases, Nat. Genet., vol. 37, no. 4, pp. 413-417, 2005.
[14]
W. Bateson and G. Mendel, Mendel’s Principles of Heredity. Putnam’s, 1909.
DOI
[15]
W. Bateson, Mendel’s Principles of Heredity. Cambridge University Press, 1909.
DOI
[16]
H. J. Cordell, Epistasis: What it means, what it doesn’t mean, and statistical methods to detect it in humans, Human Molecular Genetics, vol. 11, no. 20, pp. 2463-2468, 2002.
[17]
M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore, Multifactor-dimensionality reduction reveals high-order interactions among estrogenmetabolism genes in sporadic breast cancer, The American Journal of Human Genetics, vol. 69, no. 1, pp. 138-147, 2001.
[18]
M. Nelson, S. Kardia, R. Ferrell, and C. Sing, A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation, Genome Research, vol. 11, no. 3, pp. 458-470, 2001.
[19]
H. J. Cordell, Detecting gene-gene interactions that underlie human diseases, Nat. Rev. Genet., vol. 10, pp. 392-404, 2009.
[20]
X. Wan, C. Yang, Q. Yang, H. Xue, X. Fan, N. L. Tang, and W. Yu, Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies, The American Journal of Human Genetics, vol. 87, no. 3, pp. 325-340, 2010.
[21]
X. Zhang, S. Huang, F. Zou, and W. Wang, Team: Efficient two-locus epistasis tests in human genome-wide association study, Bioinformatics, vol. 26, no. 12, pp. i217-i227, 2010.
[22]
D. Brinza, M. Schultz, G. Tesler, and V. Bafna, Rapid detection of gene-gene interactions in genome-wide association studies, Bioinformatics, vol. 26, no. 22, pp. 2856-2862, 2010.
[23]
J. Lehár, A. Krueger, G. Zimmermann, and A. Borisy, High-order combination effects and biological robustness, Molecular Systems Biology, vol. 4, no. 1, pp. 415-425, 2008.
[24]
D. Anastassiou, Computational analysis of the synergy among multiple interacting genes, Molecular Systems Biology, vol. 3, no. 1, p. 83, 2007.
[25]
J. Shang, J. Zhang, Y. Sun, D. Liu, D. Ye, and Y. Yin, Performance analysis of novel methods for detecting epistasis, BMC Bioinformatics, vol. 12, no. 1, p. 475, 2011.
[26]
L. Ma, H. B. Runesha, D. Dvorkin, J. R. Garbe, and Y. Da, Parallel and serial computing tools for testing single-locus and epistatic snp effects of quantitative traits in genome-wide association studies, BMC Bioinformatics, vol. 9, no. 1, p. 315, 2008.
[27]
R. A. Fisher, On the interpretation of _2 from contingency tables, and the calculation of p, Journal of the Royal Statistical Society, vol. 85, no. 1, pp. 87-94, 1922.
[28]
L. S. Yung, C. Yang, X. Wan, and W. Yu, Gboost: A gpu-based tool for detecting gene-gene interactions in genome-wide case control studies, Bioinformatics, vol. 27, no. 9, pp. 1309-1310, 2011.
[29]
Y. Liu, H. Xu, S. Chen, X. Chen, Z. Zhang, Z. Zhu, X. Qin, L. Hu, J. Zhu, G.-P. Zhao, et al., Genome-wide interaction-based association analysis identified multiple new susceptibility loci for common diseases, PLoS Genetics, vol. 7, no. 3, p. e1001338, 2011.
[30]
X. Jiang, M. M. Barmada, and S. Visweswaran, Identifying genetic interactions in genomewide data using Bayesian networks, Genetic Epidemiology, vol. 34, no. 6, pp. 575-581, 2010.
[31]
C. Herold, M. Steffens, F. F. Brockschmidt, M. P. Baur, and T. Becker, Intersnp: Genomewide interaction analysis guided by a priori information, Bioinformatics, vol. 25, no. 24, pp. 3275-3281, 2009.
[32]
N. Chatterjee, Z. Kalaylioglu, R. Moslehi, U. Peters, and S. Wacholder, Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions, The American Journal of Human Genetics, vol. 79, no. 6, pp. 1002-1016, 2006.
[33]
M. Y. Park and T. Hastie, Penalized logistic regression for detecting gene interactions, Biostatistics, vol. 9, no. 1, pp. 30-50, 2008.
[34]
B. Goudey, D. Rawlinson, Q. Wang, F. Shi, H. Ferra, R. Campbell, L. Stern, M. Inouye, C. S. Ong, and A. Kowalczyk, Gwis-model-free, fast and exhaustive search for epistatic interactions in case-control gwas, BMC Genomics, vol. 14, no. Suppl 3, p. S10, 2013.
[35]
J. Piriyapongsa, C. Ngamphiw, A. Intarapanich, S. Kulawonganunchai, A. Assawamakin, C. Bootchai, P. Shaw, and S. Tongsima, iloci: A snp interaction prioritization technique for detecting epistasis in genome-wide association studies, BMC Genomics, vol. 13, no. Suppl 7, p. S2, 2012.
[36]
X. Zhang, F. Pan, Y. Xie, F. Zou, and W. Wang, Coe: A general approach for efficient genomewide two-locus epistasis test in disease association study, Lecture Notes in Computer Science, vol. 5541, pp. 253-269, 2009.
[37]
X. Zhang, F. Zou, and W. Wang, Fastchi: An efficient algorithm for analyzing genegene interactions, in Pac. Symp. Biocomput., 2009, pp. 528-539.
[38]
X. Zhang, F. Zou, and W. Wang, Fastanova: An efficient algorithm for genome-wide association study, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’08, New York, NY, USA, 2008, pp. 821-829.
DOI
[39]
R. Culverhouse, T. Klein, and W. Shannon, Detecting epistatic interactions contributing to quantitative traits, Genetic Epidemiology, vol. 27, no. 2, pp. 141-152, 2004.
[40]
H. Matsuda, Physical nature of higher-order mutual information: Intrinsic correlations and frustration, Physical Review E, vol. 62, no. 3, p. 3096, 2000.
[41]
M. Steinbach, H. Yu, G. Fang, and V. Kumar, Using constraints to generate and explore higher order discriminative patterns, in Advances in Knowledge Discovery and Data Mining. Springer, 2011, pp. 338-350.
DOI
[42]
D. Schwarz, I. Knig, and A. Ziegler, On safari to random jungle: A fast implementation of random forests for high-dimensional data, Bioinformatics, vol. 27, no. 3, pp. 439, 2010.
[43]
R. Jiang, W. Tang, X. Wu, and W. Fu, A random forest approach to the detection of epistatic interactions in case-control studies, BMC Bioinformatics, vol. 10, no. Suppl 1, p. S65, 2009.
[44]
X. Chen, C.-T. Liu, M. Zhang, and H. Zhang, A forest-based approach to identifying gene and gene-gene interactions, Proceedings of the National Academy of Sciences, vol. 104, no. 49, pp. 19199-19203, 2007.
[45]
J. Millstein, D. V. Conti, F. D. Gilliland, and W. J. Gauderman, A testing framework for identifying susceptibility genes in the presence of epistasis, The American Journal of Human Genetics, vol. 78, no. 1, pp. 15-27, 2006.
[46]
D. M. Evans, J. Marchini, A. P. Morris, and L. R. Cardon, Two-stage two-locus models in genomewide association, PLoS Genet., vol. 2, no. 9, p. e157, 2006.
[47]
X. Guo, Y. Meng, N. Yu, and Y. Pan, Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering, BMC Bioinformatics, vol. 15, no. 1, p. 102, 2014.
[48]
M. Xie, J. Li, and T. Jiang, Detecting genome wide epistases based on the clustering of relatively frequent items, Bioinformatics, vol. 28, no. 1, pp. 5-12, 2012.
[49]
Q. Long, Q. Zhang, and J. Ott, Detecting disease-associated genotype patterns, BMC Bioinformatics, vol. 10, no. Suppl 1, p. S75, 2009.
[50]
T. Zheng, H. Wang, and S.-H. Lo, Backward genotype-trait association (bgta)-based dissection of complex traits in case-control designs, Human Heredity, vol. 62, no. 4, pp. 196-212, 2006.
[51]
J. H. Moore, J. C. Gilbert, C.-T. Tsai, F.-T. Chiang, T. Holden, N. Barney, and B. C. White, A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility, Journal of Theoretical Biology, vol. 241, no. 2, pp. 252-261, 2006.
[52]
L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[53]
Y. Zhang and J. S. Liu, Bayesian inference of epistatic interactions in case-control studies, Nat. Genet., vol. 39, no. 9, pp. 1167-1173, 2007.
[54]
R. A. Fisher, On the interpretation of _2 from contingency tables, and the calculation of p, Journal of the Royal Statistical Society, vol. 85, no. 1, pp. 87-94, 1922.
[55]
W. Tang, X. Wu, R. Jiang, and Y. Li, Epistatic module detection for case-control studies: A Bayesian model with a gibbs sampling strategy, PLoS Genet., vol. 5, no. 5, p. e1000464, 2009.
[56]
C. Yang, Z. He, X. Wan, Q. Yang, H. Xue, and W. Yu, Snpharvester: A filtering-based approach for detecting epistatic interactions in genomewide association studies, Bioinformatics, vol. 25, no. 4, pp. 504-511, 2009.
[57]
H. Schwender and K. Ickstadt, Identification of snp interactions using logic regression, Biostatistics, vol. 9, no. 1, pp. 187-198, 2008.
[58]
C. Kooperberg and I. Ruczinski, Identifying interacting snps using monte carlo logic regression, Genetic Epidemiology, vol. 28, no. 2, pp. 157-170, 2005.
[59]
M. Y. Park and T. Hastie, Penalized logistic regression for detecting gene interactions, Biostatistics, vol. 9, no. 1, pp. 30-50, 2008.
[60]
D. J. Miller, Y. Zhang, G. Yu, Y. Liu, L. Chen, C. D. Langefeld, D. Herrington, and Y. Wang, An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions, Bioinformatics, vol. 25, no. 19, pp. 2478-2485, 2009.
[61]
A. Motsinger, S. Lee, G. Mellick, and M. Ritchie, Gpnn: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease, BMC Bioinformatics, vol. 7, no. 1, p. 39, 2006.
[62]
N. R. Cook, R. Y. L. Zee, and P. M. Ridker, Tree and spline based association analysis of gene-gene interaction models for ischemic stroke, Statistics in Medicine, vol. 23, no. 9, pp. 1439-1453, 2004.
[63]
J. Shang, J. Zhang, Y. Sun, and Y. Zhang, Epiminer: A three-stage co-information based method for detecting and visualizing epistatic interactions, Digital Signal Processing, vol. 24, pp. 1-13, 2014.
[64]
S. Leem, H. H. Jeong, J. Lee, K. Wee, and K.-A. Sohn, Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure, Computational Biology and Chemistry, vol. 50, pp. 19-28, 2014.
[65]
G. Fang, M. Haznadar, W. Wang, H. Yu, M. Steinbach, T. R. Church, W. S. Oetting, B. Van Ness, and V. Kumar, High-order snp combinations associated with complex diseases: Efficient discovery, statistical power and functional interactions, PLoS ONE, vol. 7, no. 4, p. e33531, 2012.
[66]
Y. Wang, X. Liu, K. Robbins, and R. Rekaya, Antepiseeker: Detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm, BMC Research Notes, vol. 3, no. 1, p. 117, 2010.
[67]
X. Wan, C. Yang, Q. Yang, H. Xue, N. L. Tang, and W. Yu, Predictive rule inference for epistatic interaction detection in genome-wide association studies, Bioinformatics, vol. 26, no. 1, pp. 30-37, 2010.
[68]
J. R. Kilpatrick, Methods for detecting multilocus genotype-phenotype association, PhD dissertation, Rice University, 2009.
[69]
X. Wan, C. Yang, Q. Yang, H. Xue, N. Tang, and W. Yu, Megasnphunter: A learning approach to detect disease predisposition snps and high level interactions in genome wide association study, BMC Bioinformatics, vol. 10, no. 1, p. 13, 2009.
[70]
C. Aporntewan, D. Ballard, J. Lee, J. Lee, Z. Wu, and H. Zhao, Gene hunting of the genetic analysis workshop 16 rheumatoid arthritis data using rough set theory, BMC Proceedings, vol. 3, no. Suppl 7, p. S126, 2009.
[71]
J. Hoh, A. Wille, and J. Ott, Trimming, weighting, and grouping snps in human casecontrol association studies, Genome Research, vol. 11, no. 12, pp. 2115-2119, 2001.
[72]
E. T. Jaynes, E. T. Jaynes: Papers on Probability, Statistics, and Statistical Physics. Springer, 1989.
[73]
S. Kullback, Information Theory and Statistics. Courier Dover Publications, 2012.
[74]
J. A. Hanley and B. J. McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, vol. 143, no. 1, pp. 29-36, 1982.
[75]
D. R. Velez, B. C. White, A. A. Motsinger, W. S. Bush, M. D. Ritchie, S. M. Williams, and J. H. Moore, A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genetic Epidemiology, vol. 31, no. 4, pp. 306-315, 2007.
[76]
R. J. Klein, C. Zeiss, E. Y. Chew, J.-Y. Tsai, R. S. Sackler, C. Haynes, A. K. Henning, J. P. SanGiovanni, S. M. Mane, S. T. Mayne, et al., Complement factor h polymorphism in agerelated macular degeneration, Science, vol. 308, no. 5720, pp. 385-389, 2005.
[77]
R. H. Duerr, K. D. Taylor, S. R. Brant, J. D. Rioux, M. S. Silverberg, M. J. Daly, A. H. Steinhart, C. Abraham, M. Regueiro, A. Griffiths, et al., A genome-wide association study identifies il23r as an inflammatory bowel disease gene, Science, vol. 314, no. 5804, pp. 1461-1463, 2006.
[78]
E. M. Reiman, J. A. Webster, A. J. Myers, J. Hardy, T. Dunckley, V. L. Zismann, K. D. Joshipura, J. V. Pearson, D. Hu-Lince, M. J. Huentelman, et al., GAB2 alleles modify alzheimer’s risk in APOE ”4 carriers, Neuron, vol. 54, no. 5, pp. 713-720, 2007.
[79]
NARAC, Genetic analysis workshop 16, http://www.gaworkshop.org/index.html, 2010.
[80]
The wellcome trust case control consortium, http://www.wtccc.org.uk/, 2007.
[81]
Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological), vol. 57. no. 1, pp. 289-300, 1995.
[82]
Y. Wang, G. Liu, M. Feng, and L. Wong, An empirical comparison of several recent epistatic interaction detection methods, Bioinformatics, vol. 27, no. 21, pp. 2936-2943, 2011.
[83]
J. R. Gibbs, M. P. van der Brug, D. G. Hernandez, B. J. Traynor, M. A. Nalls, S.-L. Lai, S. Arepalli, A. Dillman, I. P. Rafferty, J. Troncoso, et al., Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain, PLoS Genetics, vol. 6, no. 5, p. e1000952, 2010.
[84]
J. T. Bell, A. A. Pai, J. K. Pickrell, D. J. Gaffney, R. Pique-Regi, J. F. Degner, Y. Gilad, J. K. Pritchard, et al., DNA methylation patterns associate with genetic and gene expression variation in hapmap cell lines, Genome Biol., vol. 12, no. 1, p. R10, 2011.
[85]
R. Shoemaker, J. Deng, W. Wang, and K. Zhang, Allele-specific methylation is prevalent and is contributed by cpg-snps in the human genome, Genome Research, vol. 20, no. 7, pp. 883-889, 2010.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 16 June 2014
Accepted: 23 June 2014
Published: 20 November 2014
Issue date: December 2014

Copyright

The Author(s)

Acknowledgements

This study was supported by the Molecular Basis of Disease (MBD) program at Georgia State University. This work was also supported in part by the National Natural Science Foundation of China (Nos. 61379108 and 61232001).

Rights and permissions

Return