A Survey of SNP Data Analysis

Xiaojun Ding; Xuan Guo

doi:10.26599/BDMA.2018.9020015

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Survey of SNP Data Analysis

Xiaojun Ding, Xuan Guo(

)

∙ School of Computer Science and Engineering, Yulin Normal University, Yulin 537000, and School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China.

∙ Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203-5017, USA.

Show Author Information

Abstract

Every person differs from every other person regarding their physical appearance, susceptibility to disease, response to medications, and so on. However, 99.9 percent of human DNA is the same. As such, differences in human genomes are very worthy of study. Single-Nucleotide Polymorphisms (SNPs) are the simplest form and most common source of genetic polymorphism. SNPs have been used to successfully identify defective genes that cause Mendelian diseases. However, most common human diseases are complex and are caused by multiple SNPs. Each SNP explains only a small fraction of genetic causes. Experiments on individual SNPs may reveal their non-detectable effects on complex diseases. Pathogenesis is a complicated topic, and it is difficult to correctly predict multiple SNPs. As such, the analysis of SNP data is a critical task in the study of genetic diseases. In this paper, we divide the methods for genome-wide SNP data analysis into two categories: single-trait Genome-Wide Association Studies (GWAS) in which pathology is mined from data of a single phenotype, and multiple-trait GWAS which identifies cross-phenotype associations. For single-trait GWAS, we review methods ranging from the simple to the complex, including TEAM, BOOST, AntEpiSeeker, SNPRuler, EDCF, HiSeeker, ORF, MLR-tagging, MSCD, and MIC. For multiple-trait GWAS, we describe methods in terms of their employed regression models, dimension-reduction methods, and meta-analysis methods. We also list the advantages and disadvantages of these methods. Finally, we discuss the future directions of SNP data analysis for genome-wide association.

Keywords

SNP interactions SNP combinations GWAS case-control study disease association analysis cross-phenotype association studies

References

[1]

B. S. Shastry, SNP alleles in human disease and evolution, J. Hum. Genet., vol. 47, no. 11, pp. 561-566, 2002.

Crossref Google Scholar

[2]

Z. P. Cai, H. Sabaa, Y. N. Wang, R. Goebel, Z. Q. Wang, J. F. Xu, P. Stothard, and G. H. Lin, Most parsimonious haplotype allele sharing determination, BMC Bioinformatics, vol. 10, p. 115, 2009.

Crossref Google Scholar

[3]

N. J. Prescott, S. A. Fisher, A. Franke, J. Hampe, C. M. Onnie, D. Soars, R. Bagnall, M. M. Mirza, J. Sanderson, A. Forbes, et al., A nonsynonymous SNP in ATG16L1 predisposes to Ileal Crohn’s disease and is independent of CARD15 and IBD5, Gastroenterology, vol. 132, no. 5, pp. 1665-1671, 2007.

Crossref

[4]

S. Seki, Y. Kawaguchi, K. Chiba, Y. Mikami, H. Kizawa, T. Oya, F. Mio, M. Mori, Y. Miyamoto, I. Masuda, et al., A functional SNP in CILP, encoding cartilage intermediate layer protein, is associated with susceptibility to lumbar disc disease, Nat. Genet., vol. 37, no. 6, pp. 607-612, 2005.

Crossref Google Scholar

[5]

H. Zaimkohan, M. Keramatipour, S. M. H. Ghaderian, J. Tavakkoly-Bazzaz, A. Tahooni, M. Piryaei, N. M. Ghahhari, M. M. Golchin, and M. Ahani, PCSK9 SNP RS11591147 association study with coronary artery disease risk in Iran, Acta Med. Mediterr., vol. 31, p. 1435, 2015.

Google Scholar

[6]

X. Guo, N. Yu, F. Gu, X. J. Ding, J. X. Wang, and Y. Pan, Genome-wide interaction-based association of human diseases—A survey, Tsinghua Sci. Technol., vol. 19, no. 6, pp. 596-616, 2014.

Crossref Google Scholar

[7]

R. J. Klein, C. Zeiss, E. Y. Chew, J. Y. Tsai, R. S. Sackler, C. Haynes, A. K. Henning, J. P. SanGiovanni, S. M. Mane, S. T. Mayne, et al., Complement factor H polymorphism in age-related macular degeneration, Science, vol. 308, no. 5720, pp. 385-389, 2005.

Crossref Google Scholar

[8]

J. W. Shen, Z. Q. Li, Z. J. Song, J. H. Chen, and Y. Y. Shi, Genome-wide two-locus interaction analysis identifies multiple epistatic SNP pairs that confer risk of prostate cancer: A cross-population study, Int.J. Cancer, vol. 140, no. 9, pp. 2075-2084, 2017.

Crossref Google Scholar

[9]

M. J. Simmonds and S. C. L. Gough, The HLA region and autoimmune disease: Associations and mechanisms of action, Curr. Genomics, vol. 8, no. 7, pp. 453-465, 2007.

Crossref Google Scholar

[10]

H. Ueda, J. M. M. Howson, L. Esposito, J. Heward, H. Snook, G. Chamberlain, D. B. Rainbow, K. M. D. Hunter, A. N. Smith, G. Di Genova, et al., Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease, Nature, vol. 423, no. 6939, pp. 506-511, 2003.

Crossref Google Scholar

[11]

L. A. Criswell, K. A. Pfeiffer, R. F. Lum, B. Gonzales, J. Novitzke, M. Kern, K. L. Moser, A. B. Begovich, V. E. H. Carlton, W. T. Li, et al., Analysis of families in the multiple autoimmune disease genetics consortium (MADGC) collection: The PTPN22 620W allele associates with multiple autoimmune phenotypes, Am.J. Hum. Genet., vol. 76, no. 4, pp. 561-571, 2005.

Crossref Google Scholar

[12]

A. Zhernakova, C. C. Van Diemen, and C. Wijmenga, Detecting shared pathogenesis from the shared genetics of immune-related diseases, Nat. Rev. Genet., vol. 10, no. 1, pp. 43-55, 2009.

Crossref Google Scholar

[13]

R. Saxena, B. F. Voight, V. Lyssenko, N. P. Burtt, P. I. W. De Bakker, H. Chen, J. J. Roix, S. Kathiresan, J. N. Hirschhorn, M. J. Daly, et al., Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels, Science, vol. 316, no. 5829, pp. 1331-1336, 2007.

Google Scholar

[14]

R. McPherson, A. Pertsemlidis, N. Kavaslar, A. Stewart, R. Roberts, D. R. Cox, D. A. Hinds, L. A. Pennacchio, A. Tybjaerg-Hansen, A. R. Folsom, et al., A common allele on chromosome 9 associated with coronary heart disease, Science, vol. 316, no. 5830, pp. 1488-1491, 2007.

Crossref Google Scholar

[15]

A. Helgadottir, G. Thorleifsson, A. Manolescu, S. Gretarsdottir, T. Blondal, A. Jonasdottir, A. Jonasdottir, A. Sigurdsson, A. Baker, A. Palsson, et al., A common variant on chromosome 9p21 affects the risk of myocardial infarction, Science, vol. 316, no. 5830, pp. 1491-1493, 2007.

Crossref Google Scholar

[16]

N. J. Samani, J. Erdmann, A. S. Hall, C. Hengstenberg, M. Mangino, B. Mayer, R. J. Dixon, T. Meitinger, P. Braund, H. E. Wichmann, et al., Genomewide association analysis of coronary artery disease, N. Engl.J. Med., vol. 357, no. 5, pp. 443-453, 2007.

Crossref Google Scholar

[17]

L. A. Hindorff, P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins, and T. A. Manolio, Potential etiologic and functional implications of genome-wide association loci for human diseases and traits, Proc. Natl. Acad. Sci. USA, vol. 106, no. 23, pp. 9362-9367, 2009.

Crossref Google Scholar

[18]

X. Guo, J. Zhang, Z. P. Cai, D. Z. Du, and Y. Pan, Searching genome-wide multi-locus associations for multiple diseases based on Bayesian inference, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 14, no. 3, pp. 600-610, 2017.

Crossref Google Scholar

[19]

X. Guo, J. Zhang, Z. P. Cai, D. Z. Du, and Y. Pan, Dam: A Bayesian method for detecting genome-wide associations on multiple diseases, in Proc. 11th Int. Symp. Bioinformatics Research and Applications, Norfolk, VA, USA, 2015, pp. 96-107.

Crossref

[20]

B. D. Hobbs, K. De Jong, M. Lamontagne, Y. Bossé, N. Shrine, M. S. Artigas, L. V. Wain, I. P. Hall, V. E. Jackson, A. B. Wyss, et al., Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis, Nat. Genet., vol. 49, no. 3, pp. 426-432, 2017

Google Scholar

[21]

R. M. Plenge, L. Padyukov, E. F. Remmers, S. Purcell, A. T. Lee, E. W. Karlson, F. Wolfe, D. L. Kastner, L. Alfredsson, D. Altshuler, et al., Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: Association of susceptibility with PTPN22, CTLA4, and PADI4, Am. J. Hum. Genet., vol. 77, no. 6, pp. 1044-1060, 2005.

Crossref Google Scholar

[22]

C. Kyogoku, W. A. Ortmann, A. Lee, S. Selby, V. E. H. Carlton, M. Chang, P. Ramos, E. C. Baechler, F. M. Batliwalla, J. Novitzke, et al., Genetic association of the R620W polymorphism of protein tyrosine phosphatase PTPN22 with human SLE, Am. J. Hum. Genet., vol. 75, no. 3, pp. 504-507, 2004.

Crossref Google Scholar

[23]

J. A. Todd, N. M. Walker, J. D. Cooper, D. J. Smyth, K. Downes, V. Plagnol, R. Bailey, S. Nejentsev, S. F. Field, F. Payne, et al., Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes, Nat. Genet., vol. 39, no. 7, pp. 857-864, 2007.

Crossref Google Scholar

[24]

W. S. Bush, M. T. Oetjens, and D. C. Crawford, Unravelling the human genome-phenome relationship using phenome-wide association studies, Nat. Rev. Genet., vol. 17, no. 3, pp. 129-145, 2016.

Crossref Google Scholar

[25]

J. MacArthur, E. Bowler, M. Cerezo, L. Gil, P. Hall, E. Hastings, H. Junkins, A. McMahon, A. Milano, J. Morales, et al., The new NHGRI-EBI Catalog of published genome- wide association studies (GWAS Catalog), Nucleic Acids Res., vol. 45, no. D1, pp. D896-D901, 2017.

Crossref Google Scholar

[26]

S. Sivakumaran, F. Agakov, E. Theodoratou, J. G. Prendergast, L. Zgaga, T. Manolio, I. Rudan, P. McKeigue, J. F. Wilson, and H. Campbell, Abundant pleiotropy in human complex diseases and traits, Am. J. Hum. Genet., vol. 89, no. 5, pp. 607-618, 2011.

Crossref Google Scholar

[27]

N. Solovieff, C. Cotsapas, P. H. Lee, S. M. Purcell, and J. W. Smoller, Pleiotropy in complex traits: Challenges and strategies, Nat. Rev. Genet., vol. 14, no. 7, pp. 483-495, 2013.

Crossref Google Scholar

[28]

Y. Zhang and J. S. Liu, Bayesian inference of epistatic interactions in case-control studies, Nat. Genet., vol. 39, no. 9, pp. 1167-1173, 2007.

Crossref Google Scholar

[29]

W. Li and J. Reich, A complete enumeration and classification of two-locus disease models, Hum. Hered., vol. 50, no. 6, pp. 334-349, 2000.

Crossref Google Scholar

[30]

D. R. Velez, B. C. White, A. A. Motsinger, W. S. Bush, M. D. Ritchie, S. M. Williams, and J. H. Moore, A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction, Genet. Epidemiol., vol. 31, no. 4, pp. 306-315, 2007.

Crossref Google Scholar

[31]

X. Zhang, S. P. Huang, F. Zou, and W. Wang, Team: Efficient two-locus epistasis tests in human genome-wide association study, Bioinformatics, vol. 26, no. 12, pp. i217-i227, 2010.

Crossref Google Scholar

[32]

Y. Wang, G. M. Liu, M. L. Feng, and L. Wong, An empirical comparison of several recent epistatic interaction detection methods, Bioinformatics, vol. 27, no. 21, pp. 2936-2943, 2011.

Crossref Google Scholar

[33]

X. Wan, C. Yang, Q. Yang, H. Xue, X. D. Fan, N. L. S. Tang, and W. C. Yu, Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies, Am. J. Hum. Genet., vol. 87, no. 3, pp. 325-340, 2010.

Crossref Google Scholar

[34]

H. Matsuda, Physical nature of higher-order mutual information: Intrinsic correlations and frustration, Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics, vol. 62, no. 3, pp. 3096-3102, 2000.

Crossref Google Scholar

[35]

L. S. Yung, C. Yang, X. Wan, and W. C. Yu, GBOOST: A GPU-based tool for detecting gene-gene interactions in genome-wide case control studies, Bioinformatics, vol. 27, no. 9, pp. 1309-1310, 2011.

Crossref Google Scholar

[36]

J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Cambridge, MA, USA: MIT Press, 1992.

Crossref

[37]

L. Y. Chuang, M. C. Lin, H. W. Chang, and C. H. Yang, Odds ratio-based genetic algorithm for prediction of snp-snp interactions in breast cancer association study, presented at the 26th Int. Conf. Advanced Information Networking and Applications Workshops (WAINA), Fukuoka, Japan, 2012, pp. 920-925.

Crossref

[38]

J. B. Chen, L. Y. Chuang, Y. D. Lin, C. W. Liou, T. K. Lin, W. C. Lee, B. C. Cheng, H. W. Chang, and C. H. Yang, Genetic algorithm-generated SNP barcodes of the mitochondrial D-loop for chronic dialysis susceptibility, Mitochondrial DNA, vol. 25, no. 3, pp. 231-237, 2014.

Crossref Google Scholar

[39]

C. H. Yang, Y. D. Lin, L. Y. Chuang, and H. W. Chang, Evaluation of breast cancer susceptibility using improved genetic algorithms to generate genotype SNP barcodes, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 10, no. 2, pp. 361-371, 2013.

Crossref Google Scholar

[40]

M. Dorigo and L. M. Gambardella, Ant colonies for the travelling salesman problem, Biosystems, vol. 43, no. 2, pp. 73-81, 1997.

Crossref Google Scholar

[41]

Y. P. Wang, X. Y. Liu, K. Robbins, and R. Rekaya, AntEpiSeeker: Detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm, BMC Res. Notes, vol. 3, p. 117, 2010.

Crossref Google Scholar

[42]

L. Y. Chuang, M. C. Lin, H. W. Chang, and C. H. Yang, Analysis of SNP interaction combinations to determine breast cancer risk with PSO, presented at the 11th Int. Conf. Bioinformatics and Bioengineering (BIBE), Taichung, China, 2011, pp. 291-294.

Crossref

[43]

S. J. Wu, L. Y. Chuang, Y. D. Lin, W. H. Ho, F. T. Chiang, C. H. Yang, and H. W. Chang, Particle swarm optimization algorithm for analyzing SNP-SNP interaction of renin-angiotensin system genes against hypertension, Mol. Biol. Rep., vol. 40, no. 7, pp. 4227-4233, 2013.

Crossref Google Scholar

[44]

D. H. Kim, S. Uhmn, and J. Kim, Finding relevant SNP sets and predicting disease risk using simulated annealing, Int.J. Softw. Eng. Appl., vol. 6, no. 3, pp. 81-88, 2012.

Google Scholar

[45]

R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc. 20th VLDB Conf., Santiago, Chile, 1994, pp. 487-499.

[46]

X. Wan, C. Yang, Q. Yang, H. Xue, N. L. S. Tang, and W. C. Yu, Predictive rule inference for epistatic interaction detection in genome-wide association studies, Bioinformatics, vol. 26, no. 1, pp. 30-37, 2010.

Crossref Google Scholar

[47]

Y. Wang, G. M. Liu, M. L. Feng, and L. Wong, Response: An empirical comparison of several recent epistatic interaction detection methods, Bioinformatics, vol. 28, no. 1, pp. 147-148, 2012.

Crossref Google Scholar

[48]

M. Z. Xie, J. Li, and T. Jiang, Detecting genome-wide epistases based on the clustering of relatively frequent items, Bioinformatics, vol. 28, no. 1, pp. 5-12, 2012.

Crossref Google Scholar

[49]

J. Liu, G. X. Yu, Y. Jiang, and J. Wang, Hiseeker: Detecting high-order SNP interactions based on pairwise SNP combinations, Genes, vol. 8, no. 6, p. 153, 2017.

Crossref Google Scholar

[50]

W. D. Mao and J. Lee, A combinatorial analysis of genetic data for Crohn’s disease, presented at the 1st Int. Conf. Bioinformatics and Biomedical Engineering, Wuhan, China, 2007, pp. 1031-1034.

Crossref

[51]

J. W. He and A. Zelikovsky. Multiple linear regression for index SNP selection on unphased genotypes, presented at the 28th Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society, New York, NY, USA, 2006, pp. 5759-5762.

Crossref

[52]

Z. Z. Feng, X. J. Yang, S. Subedi, and P. D. McNicholas, The lasso and sparse least squares regression methods for SNP selection in predicting quantitative traits, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 9, no. 2, pp. 629-636, 2012.

Crossref Google Scholar

[53]

T. T. Wu, Y. F. Chen, T. Hastie, E. Sobel, and K. Lange, Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, vol. 25, no. 6, pp. 714-721, 2009.

Crossref Google Scholar

[54]

X. J. Ding, J. X. Wang, A. Zelikovsky, X. Guo, M. Z. Xie, and Y. Pan, Searching high-order SNP combinations for complex diseases based on energy distribution difference, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 12, no. 3, pp. 695-704, 2015.

Crossref Google Scholar

[55]

S. Leem, H. H. Jeong, J. Lee, K. Wee, and K. A. Sohn, Fast detection of high-order epistatic interactions in genome-wide association studies using information theoretic measure, Comput. Biol. Chem., vol. 50, pp. 19-28, 2014.

Crossref Google Scholar

[56]

J. Hodgkin, Seven types of pleiotropy, Int.J. Dev. Biol., vol. 42, no. 3, pp. 501-505, 1998.

Google Scholar

[57]

J. F. Liu, Y. F. Pei, C. J. Papasian, and H. W. Deng, Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations, Genet. Epidemiol., vol. 33, no. 3, pp. 217-227, 2009.

Crossref Google Scholar

[58]

Q. Yang, H. S. Wu, C. Y. Guo, and C. S, Fox, Analyze multivariate phenotypes in genetic association studies by combining univariate association tests, Genet. Epidemiol., vol. 34, no. 5, pp. 444-454, 2010.

Crossref Google Scholar

[59]

J. Huang, A. D. Johnson, and C. J. O’donnell, PRIMe: A method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies, Bioinformatics, vol. 27, no. 9, pp. 1201-1206, 2011.

Crossref Google Scholar

[60]

M. D. Yuan and G. Q. Diao, Joint association analysis of bivariate quantitative and qualitative traits, BMC Proc., vol. 5, no. S9, p. S74, 2011.

Crossref Google Scholar

[61]

A. Maity, P. F. Sullivan, and J. I. Tzeng, Multivariate phenotype association analysis by marker-set kernel machine regression, Genet. Epidemiol., vol. 36, no. 7, pp. 686-695, 2012.

Crossref Google Scholar

[62]

P. F. O’Reilly, C. J. Hoggart, Y. Pomyen, F. C. F. Calboli, P. Elliott, M. R. Jarvelin, and L. J. M. Coin, MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS, PLoS One, vol. 7, no. 5, p. e34861, 2012.

Crossref Google Scholar

[63]

P. Marttinen, M. Pirinen, A. P. Sarin, J. Gillberg, J. Kettunen, I. Surakka, A. J. Kangas, P. Soininen, P. O’Reilly, M. Kaakinen, et al., Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression, Bioinformatics, vol. 30, no. 14, pp. 2026-2034, 2014.

Crossref Google Scholar

[64]

Y. F. Wang, A. Y. Liu, J. L. Mills, M. Boehnke, A. F. Wilson, J. E. Bailey-Wilson, M. M. Xiong, C. O. Wu, and R. Z. Fan, Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models, Genet. Epidemiol., vol. 39, no. 4, pp. 259-275, 2015.

Crossref Google Scholar

[65]

D. Ray, J. S. Pankow, and S. Basu, USAT: A unified score-based association test for multiple phenotype-genotype analysis, Genet. Epidemiol., vol. 40, no. 1, pp. 20-34, 2016.

Crossref Google Scholar

[66]

F. P. Casale, B. Rakitsch, C. Lippert, and O. Stegle, Efficient set tests for the genetic analysis of correlated traits, Nat. Methods, vol. 12, no. 8, pp. 755-758, 2015.

Crossref Google Scholar

[67]

B. L. Wu and J. S. Pankow, Sequence kernel association test of multiple continuous phenotypes, Genet. Epidemiol., vol. 40, no. 2, pp. 91-100, 2016.

Crossref Google Scholar

[68]

D. D. Lin, J. Y. Li, V. D. Calhoun, and Y. P. Wang, Detection of genetic factors associated with multiple correlated imaging phenotypes by a sparse regression model, presented at the 12th Int. Symp. Biomedical Imaging (ISBI), New York, NY, USA, 2015, pp. 1368-1371.

Crossref

[69]

B. Bulik-Sullivan, H. K. Finucane, V. Anttila, A. Gusev, F. R. Day, P. R. Loh, L. Duncan, J. R. B. Perry, N. Patterson, E. B. Robinson, et al., An atlas of genetic correlations across human diseases and traits, Nat. Genet., vol. 47, no. 11, pp. 1236-1241, 2015.

Crossref Google Scholar

[70]

Z. C. Wang, Q. Y. Sha, and S. L. Zhang, Joint analysis of multiple traits using "optimal" maximum heritability test, PLoS One, vol. 11, no. 3, p. e0150975, 2016.

Crossref Google Scholar

[71]

J. P. Sun, K. Oualkacha, V. Forgetta, H. F. Zheng, J. B. Richards, A. Ciampi, C. M. T. Greenwood, and U. Consortium, A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects, Eur.J. Hum. Genet., vol. 24, no. 9, pp. 1344-1351, 2016.

Crossref Google Scholar

[72]

S. Lee, S. Won, Y. J. Kim, Y. Kim, B. J. Kim, and T. Park, Rare variant association test with multiple phenotypes, Genet. Epidemiol., vol. 41, no. 3, pp. 198-209, 2017.

Crossref Google Scholar

[73]

X. Zhan, N. Zhao, A. Plantinga, T. A. Thornton, K. N. Conneely, M. P. Epstein, and M. C. Wu, Powerful genetic association analysis for common or rare variants with high-dimensional structured traits, Genetics, vol. 206, no. 4, pp. 1779-1790, 2017.

Crossref Google Scholar

[74]

L. Klei, D. Luca, B. Devlin, and K. Roeder, Pleiotropy and principal components of heritability combine to increase power for association analysis, Genet. Epidemiol., vol. 32, no. 1, pp. 9-19, 2008.

Crossref Google Scholar

[75]

H. Mei, W. Chen, A. Dellinger, J. He, M. Wang, C. Yau, S. R. Srinivasan, and G. S. Berenson, Principal-component-based multivariate regression for genetic association studies of metabolic syndrome components, BMC Genet., vol. 11, p. 100, 2010.

Crossref Google Scholar

[76]

I. Mukhopadhyay, S. Saha, and S. Ghosh, Integrating binary traits with quantitative phenotypes for association mapping of multivariate phenotypes, BMC Proc., vol. 5 Suppl 9, p. S73, 2011.

Crossref Google Scholar

[77]

C. S. Tang and M. A. R. Ferreira, A gene-based test of association using canonical correlation analysis, Bioinformatics, vol. 28, no. 6, pp. 845-850, 2012.

Crossref Google Scholar

[78]

J. A. Seoane, C. Campbell, I. N. M. Day, J. P. Casas, and T. R. Gaunt, Canonical correlation analysis for gene-based pleiotropy discovery, PLoS Comput. Biol., vol. 10, no. 10, p. e1003876, 2014.

Crossref Google Scholar

[79]

J. S. Ried, M. J. Jeff, A. Y. Chu, J. L. Bragg-Gresham, J. Van Dongen, J. E. Huffman, T. S. Ahluwalia, G. Cadby, N. Eklund, J. Eriksson, T. Esko, et al., A principal component meta-analysis on multiple anthropometric traits identifies novel loci for body shape, Nat. Commun., vol. 7, p. 13357, 2016.

Google Scholar

[80]

N. Lin, Y. Zhu, R. Z. Fan, and M. M. Xiong, A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data, PLoS Comput. Biol., vol. 13, no. 10, p. e1005788, 2017.

Crossref Google Scholar

[81]

A. Derkach, J. F. Lawless, and L. Sun, Robust and powerful tests for rare variants using Fisher’s method to combine evidence of association from two or more complementary tests, Genet. Epidemiol., vol. 37, no. 1, pp. 110-121, 2013.

Crossref Google Scholar

[82]

S. Van Der Sluis, C. V. Dolan, J. Li, Y. Song, P. C. Sham, D. Posthuma, and M. X. Li, MGAS: A powerful tool for multivariate gene-based genome-wide association analysis, Bioinformatics, vol. 31, no. 7, pp. 1007-1015, 2015.

Crossref Google Scholar

[83]

J. Kim, Y. W. Zhang, and W. Pan, Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data, Genetics, vol. 203, no. 2, pp. 715-731, 2016.

Crossref Google Scholar

[84]

A. Cichonska, J. Rousu, P. Marttinen, A. J. Kangas, P. Soininen, T. Lehtimäki, O. T. Raitakari, M. R. Järvelin, V. Salomaa, M. Ala-Korpela, et al., metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis, Bioinformatics, vol. 32, no. 13, pp. 1981-1989, 2016.

Crossref Google Scholar

[85]

X. Y. Liang, Z. C. Wang, Q. Y. Sha, and S. L. Zhang, An adaptive fisher’s combination method for joint analysis of multiple phenotypes in association studies, Sci. Rep., vol. 6, p. 34323, 2016.

Crossref Google Scholar

[86]

B. C. Brown, C. J. Ye, A. L. Price, and N. Zaitlen, Transethnic genetic-correlation estimates from summary statistics, Am.J. Hum. Genet., vol. 99, no. 1, pp. 76-88, 2016.

Crossref Google Scholar

[87]

I. Y. Kwak and W. Pan, Gene- and pathway-based association tests for multiple traits with GWAS summary statistics, Bioinformatics, vol. 33, no. 1, pp. 64-71, 2016.

Crossref Google Scholar

[88]

D. Ray and M. Boehnke, Methods for meta-analysis of multiple traits using GWAS summary statistics, Genet. Epidemiol., vol. 42, no. 2, pp. 134-145, 2018.

Crossref Google Scholar

[89]

Z. H. Liu and X. H. Lin, Multiple phenotype association tests using summary statistics in genome-wide association studies, Biometrics, vol. 74, no. 1, pp. 165-175, 2018.

Crossref Google Scholar

[90]

D. B. Hall, On the application of extended quasi-likelihood to the clustered data case, Can.J. Stat., vol. 29, no. 1, pp. 77-97, 2001.

Crossref Google Scholar

[91]

M. C. Wu, S. Lee, T. X. Cai, Y. Li, M. Boehnke, and X. H. Lin, Rare-variant association testing for sequencing data with the sequence kernel association test, Am.J. Hum. Genet., vol. 89, no. 1, pp. 82-93, 2011.

Crossref Google Scholar

[92]

I. Ionita-Laza, S. Lee, V. Makarov, J. D. Buxbaum, and X. H. Lin, Sequence kernel association tests for the combined effect of rare and common variants, Am.J. Hum. Genet., vol. 92, no. 6, pp. 841-853, 2013.

Crossref Google Scholar

[93]

X. Zhan, S. Girirajan, N. Zhao, M. C. Wu, and D. Ghosh, A novel copy number variants kernel association test with application to autism spectrum disorders studies, Bioinformatics, vol. 32, no. 23, pp. 3603-3610, 2016.

Crossref Google Scholar

[94]

K. A. Broadaway, D. J. Cutler, R. Duncan, J. L. Moore, E. B. Ware, M. A. Jhun, L. F. Bielak, W. Zhao, J. A. Smith, P. A. Peyser, et al., A statistical approach for testing cross-phenotype effects of rare variants, Am.J. Hum. Genet., vol. 98, no. 3, pp. 525-540, 2016.

Crossref Google Scholar

[95]

H. Hotelling, Relations between two sets of variates, Biometrika, vol. 28, nos. 3&4, pp. 321-377, 1936.

Crossref Google Scholar

[96]

B. Han, J. G. Pouget, K. Slowikowski, E. Stahl, C. H. Lee, D. Diogo, X. Hu, Y. R. Park, E. Kim, P. K. Gregersen, et al., A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases, Nature Genetics, vol. 48, no. 7, pp. 803-810, 2016.

Crossref Google Scholar

[97]

K. N. Conneely and M. Boehnke, So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests, Am.J. Hum. Genet., vol. 81, no. 6, pp. 1158-1168, 2007.

Crossref Google Scholar

[98]

J. Kim, Y. Bai, and W. Pan, An adaptive association test for multiple phenotypes with GWAS summary statistics, Genet. Epidemiol., vol. 39, no. 8, pp. 651-663, 2015.

Crossref Google Scholar

[99]

V. Didelez and N. Sheehan, Mendelian randomization as an instrumental variable approach to causal inference, Statistical Methods in Medical Research, vol. 16, no. 4, pp. 309-330, 2007.

Crossref Google Scholar

[100]

J. Pearl, Causality: Models, Reasoning, and Inference. Cambridge, UK: Cambridge University Press, 2009.

Crossref

[101]

M. F. Del Greco, C. Minelli, N. A. Sheehan, and J. R. Thompson, Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome, Stat. Med., vol. 34, no. 21, pp. 2926-2940, 2015.

Crossref Google Scholar

[102]

J. Bowden, M. F. Del Greco, C. Minelli, G. Davey Smith, N. Sheehan, and J. Thompson, A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization, Stat. Med., vol. 36, no. 11, pp. 1783-1802, 2017.

Crossref Google Scholar

[103]

J. Bowden, G. D. Smith, and S. Burgess, Mendelian randomization with invalid instruments: Effect estimation and bias detection through Egger regression, Int. J. Epidemiol., vol. 44, no. 2, pp. 512-525, 2015.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 1 Issue 3,
September 2018

Pages 173-190

DOI: 10.26599/BDMA.2018.9020015

Cite this article:

Ding X, Guo X. A Survey of SNP Data Analysis. Big Data Mining and Analytics, 2018, 1(3): 173-190. https://doi.org/10.26599/BDMA.2018.9020015

946

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 12 January 2018

Accepted: 17 January 2018

Published: 24 May 2018